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The scalar additive Gaussian noise channel has the "single crossing point" property between the minimum-mean 

^ Cy square error (MMSE) in the estimation of the input given the channel output, assuming a Gaussian input to the 

| .^ channel, and the MMSE assuming an arbitrary input. This paper extends the result to the parallel MIMO additive 

\^ Gaussian channel in three phases: i) The channel matrix is the identity matrix, and we limit the Gaussian input to 

^ ^ a vector of Gaussian i.i.d. elements. The "single crossing point" property is with respect to the snr (as in the scalar 

I I case), ii) The channel matrix is arbitrary, the Gaussian input is limited to an independent Gaussian input. A "single 

I I crossing point" property is derived for each diagonal element of the MMSE matrix, iii) The Gaussian input is allowed 

C/3 to be an arbitrary Gaussian random vector. A "single crossing point" property is derived for each eigenvalue of the 

o 

i__i MMSE matrix. 

, These three extensions are then translated to new information theoretic properties on the mutual information, using 

^ the fundamental relationship between estimation theory and information theory. The results of the last phase are also 

^_ translated to a new property of Fisher's information. Finally, the applicability of all three extensions on information 

^>^ theoretic problems is demonstrated through: a proof of a special case of Shannon's vector EPI, a converse proof of 

the capacity region of the parallel degraded MIMO broadcast channel (BC) under per-antenna power constrains and 

under covariance constraints, and a converse proof of the capacity region of the compound parallel degraded MIMO 



in 

en 

o 

^Nj BC under covariance constraint. 



*^ I. Introduction 

2 This paper considers parallel multiple-input multiple-output (MIMO) channels, with an arbitrary input distribution 

and additive standard Gaussian noise. These channels are a subset of the important family of MIMO additive 
Gaussian noise channels, which have been extensively investigated in the literature. For most Gaussian channel 
models studied in information theory, Gaussian signaling happens to be optimal, from point-to-point channels, to 
multiple-access channels (MAC), and broadcast channels (BC) [1, Ch. 9 and 15] [2]. The methods used to prove 
this optimality were not easy to come across, even when considering scalar Gaussian channels. For example, in 
order to prove that Gaussian inputs are optimal for the scalar Gaussian BC, Bergmans employed Shannon's entropy 
power inequality (EPI) [3]. The solution for the MIMO Gaussian BC came only 30 years later in [2], using a new 
enhancement approach. Since then, several other proofs were derived, using different tools, such as, the extremal 
inequality in [4], the de Bruijn identity in coordination with Dembo's inequality in [5], and the "single crossing 
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point" property presented by Guo et al. in [6]. The "single crossing point" stemmed from the I-MMSE relationship, 
a fundamental relationship between estimation theory and information theory revealed by Guo, Shamai, and Verdii 
in [7]. 

The relationship between estimation theory and information theory goes back to the late 1950's, when Stam [8] 
used the de Bruijn's identity to prove Shannon's EPI, and then in the early 1970's when the mutual information 
was represented as a function of the causal filtering error by Duncan [9] and Kadota, Zakai and Ziv [10]. The I- 
MMSE relationship, given for discrete-time and continuous-time, scalar and vector additive Gaussian noise channels, 
deepens the connection between these two fields. Specifically, for a scalar additive Gaussian noise channel, 

Y = y/snrX + N (1) 

where N is standard Gaussian additive noise, then, regardless of the input distribution of X, the mutual information, 
/ {X; Y), and minimum-mean-square error (MMSE) in the estimation of X given the observation Y, mmse(X, snr), 
are related (assuming real-valued inputs/outputs) by 

Ds„J{X;Y) = ^mmseiX,sm) (2) 

where Dsp,. is the derivative with respect to snr, and 

mmse(X,snr) = E{(X-E{X | y/smX + N}f}. (3) 

The work in [7] has been extended in several directions, among which we have: the additive Poisson noise channel 
[11], [12], the general additive noise channel [13], arbitrary channels [14], representation of the relative entropy as 
a function of the difference between the mismatched MMSE and the matched MMSE in [15], [16], and others. One 
important extension, on which we heavily rely, is the one done by Palomar and Verdii in [17], where they obtain 
the gradient of the mutual information with respect to different parameters of the MIMO channel. 

Going back to the "single crossing point" property, one of the goals in [6] was to show the applicability of the 
I-MMSE relationship as a tool to solve information-theoretic problems. Specifically, the authors of [6] examined 
the scalar Gaussian BC and gave an alternative proof for the optimality of Gaussian inputs. In order to show this, 
Guo et al. defined the following function in [6]: 

/(X,7) = (l + 7)-i-mmse(X,7) (4) 

where the simplified notation /{-y) will be used when there is no confusion about the distribution of X. It was 
shown that 7(7) has at most a single crossing point of the horizontal axis. In other words, the first term, which 
is the MMSE assuming a standard Gaussian input, may be smaller than the second term in some range of snr 
values (note that the parameter 7 is the snr); however, once the two terms are equal, at some 70, the MMSE of 
the standard Gaussian input remains greater than the MMSE of the arbitrary input for all 7 > 70, and the function 
remains nonnegative. This property together with the I-MMSE relationship, provides the missing link to derive a 
simple and elegant converse proof of the capacity region of the scalar Gaussian BC. 
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The "single crossing point" was derived only for the scalar additive Gaussian channel, as can be seen from 
the definition of the function f{j). The motivation of this work is to extend this property to the vector Gaussian 
channel. This extension is done in three phases. First, we consider random input vectors instead of random input 
scalars, but keep the dependence on a scalar quantity - the channel snr. In this setting we also limit the Gaussian 
input to a vector of Gaussian i.i.d. elements. In the second and third phases we consider dependence on a vector 
quantity - a parallel channel matrix. The difference between the second and third phases is that in the second phase 
we limit the Gaussian input to an independent random vector, whereas in the third phase the Gaussian input is 
arbitrary. In all three phases, the general channel model considered is the following: 

Y = HX + N (5) 

where TV is a standard Gaussian random vector, and H is a square and diagonal channel matrix known to the 
receiver(s). In the vector case, the scalar MMSE does not capture all the needed information, and we need to resort 
to the matrix extension, the MMSE matrix defined as 

Ex = E{iX - E{X lux + N}){X - E{X \UX + N})'^} (6) 

from which we can see that, in general, the MMSE matrix Ex depends on the channel Ex ~ Ex(H), but 
whenever the channel coefficients depend on other parameters H = H(</)), we will write Ex (</>)■ Observe that the 
standard scalar MMSE value in the vector case can be easily recovered from the MMSE matrix as follows: 

mse(X,snr) = E{||X- E{X \ y/snrX + N}\\^} = Tr(Ex(ysnrI„))- (7) 



m 

For the important case when the input distribution of X is Gaussian with covariance matrix Rxg ws will use the 
following notation: 

Eg(Rx«,H) = (R^i^+hTh)-i (8) 

where we assumed that Rxg is of full rank. As in the case of Ex, whenever the channel coefficients depend on 
other parameters H = ii{4)), we will write Eg(Rxg,0)- Another important quantity is the MMSE given for a 
specific output, Y = y, defined as: 

*x(y) = E{(X - E {X I y}){X -E{X\ y}f\y}. (9) 

Although not specified explicitly, $x (y) depends on the channel matrix/parameters. Note that Ex (H) = E{ $x (Y) } . 
Interestingly, when the input distribution of X is Gaussian, $x(y) is independent of y and the following equality 
holds for all y: Eg(RxgiH) — $x(y)- Finally, given all these quantities we can define the main player in this 
work: the MMSE matrix difference (analog to /{-y) in the scalar case) 

Q(X,Rxg,</')=Eg(Rxg,0)-Ex(0) (10) 

= Exg(0)-Ex(</') (11) 

= (R^i^ + (H(c/.))Th(0))-i - Ex(</.) (12) 
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where, similarly to the scalar case in (4), we will use the simplified notation Q(0) when the distribution of X and 
the covariance matrix of the Gaussian distribution Rxg ^re clear from the context. Note that there is no requirement 
that the covariance of the random vector X be equal to Rxg ■ 

As it has already been pointed out, the extension from scalar-to-vector is done in three phases. In the first step 
the dependence remains on a scalar parameter - the snr. This is obtained by setting H = I in the general MIMO 
model in (5). We further limit our observation to the comparison of an arbitrary input distribution with the subset 
of Gaussian random vectors with i.i.d. elements. For this case, we show that the "single crossing point" property 
extends smoothly to any linear combination of Q(X, Rxgi </') with a positive semi definite matrix. Although this 
is the simplest scalar-to-vector extension, the proof is not straightforward. In order to demonstrate the applicability 
of this result we extend the proof of a special case of Shannon's EPI, done in [6], to the vector case. 

Proceeding with the scalar-to-vector extension, we assume that the channel matrix, H, is parallel, thus our 
dependence is now on a vector parameter. In this setting we have two distinguishable results, given in phases two 
and three, that cannot be trivially deduced from each other In phase two, we limit the Gaussian distribution, to 
which we compare, to any independent Gaussian distribution characterized by its diagonal covariance matrix, Axg ■ 
Under this assumption we show that a "single crossing point" property exists for each and every diagonal element 
of Q(X, Axgi 0)- Together with the I-MMSE relationship, this result provides some interesting properties of the 
mutual information, and its applicability is demonstrated by providing a simple converse proof for the parallel 
Gaussian BC capacity region under per-antenna power constraints. 

The third phase, which is the main result of this work, does not require any further assumptions (apart from 
the diagonal channel matrix). That is, we compare an arbitrary input distribution with any general Gaussian input 
distribution, with covariance Rxg- ^^ this setting we show that a "single crossing point" property exists for each 
and every eigenvalue of the matrix Q(X,Rxgi0)- The applicability of this result is demonstrated with two 
information-theoretic problems: the converse proof of the parallel Gaussian BC capacity region under covariance 
constraint and the converse proof of the compound parallel Gaussian BC capacity region under covariance constraint. 

Much of this work regards the behavior of functions around zeros, the existence and amount of actual crossings 
of the horizontal axis. Thus, before proceeding with the technical content of the paper and, in order to make these 
observations rigorous, we require the next definitions which will be used throughout the paper. 

Definition 1: Given a function h{t) continuous within the neighborhood of io, we say that a negative-to-nonnegative 
zero crossing occurs at f = to if^ and only if, /i(to) = and there exists a positive value e such that h{t) < for 
t e (to - e, to) and h{t) > for t G (to, ^o + e)- 

Definition 2: Given a function h{t) continuous within the neighborhood of to, we say that a nonnegative-to- 
negative zero crossing occurs at t = to if, and only if, h{to) = and there exists a positive value e such that 
h{t) > for t e (to - e, to) and h{t) < for t e (to, to + e). 

Similar definitions can be given for positive-to-nonpositive and nonpositive-to-positive zero crossings. Another 
required definition is the following: 

Definition 3: Given a function h{t) continuous within the neighborhood of to, we say that a negative-zero-positive 
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crossing occurs at i = fg if^ and only if, a negative-to-nonegative zero crossing occurs as t = fg and there exists a 
positive S such that h{t) = for t e {to, to + S) and a nonpositive-to-positive zero crossing occurs as to + 5. 
Similarly we can define a positive-zero-negative crossing. 

The remaining of this paper is organized as follows: Section II considers the first phase of our extension from 
scalar-to-vector, in which case the dependence is on the scalar parameter, snr. In Section III we provide the 
framework in which we handle the assumption of a parallel channel matrix, H. This framework is relevant for 
phases two and three of our scalar-to-vector extension. In Section IV we consider phase two of our extension, 
where we limit our observations to an independent Gaussian input distribution. Section V considers phase three, 
where we compare the arbitrary input to any general Gaussian input distribution. 

Notation: Straight boldface denotes multivariate quantities such as vectors (lowercase) and matrices (uppercase). 
Uppercase italics denotes random variables (boldface if we consider random vectors rather then random variables), 
and their realizations are represented by lowercase italics. The set of n-dimensional positive semidefinite matrices is 
denoted by S" . The elements of a matrix A are represented by [A]ij. The operator diag (A) represents a column 
vector with the diagonal entries of matrix A, and Diag (a) represents a diagonal matrix whose non-zero elements 
are given by the elements of vector a. The superscript (•)^ denotes the transpose. The operator Tr(-) denotes the 
trace function, and | • | denotes the determinant function. The operator D-^A denotes the Jacobian matrix of A with 
respect to 7 [18]. 

Note that we also consider the conditioned version of the above defined quantities. That is, when the random 
vector X depends on the random vector U, we require, for example, a conditioned version for the MMSE and the 
matrix Q given for a specific value of U = u. In this case both quantities depend on an additional parameter u, 
i.e., 'Ex\u{4>7u) and Q{X\U = it, Rxg;,0) (the precise definitions given in Section IV-B). 

II. The Scalar MIMO Channel 

As pointed out in the introduction, we begin our study with the simplest multivariate extension of the result in 
[6, Prp. 16], that is, we consider that the scalar random variables involved in the model in (1) become random 
vectors. In other words, in this section, we consider the following model: 

Y = y/snrX + N (13) 

where the input random vector X E R" is arbitrarily distributed and N E R" follows a standard Gaussian 
distribution. Observe that (13) is obtained by setting H = ^snrl„ in the vector model in (5). 

Moreover, we further limit our discussion in this section to the comparison with a Gaussian input with i.i.d. 
elements, i.e., we assume that Rxg = ""^In- 

Thus, for the settings in this section, the general MMSE matrix difference function in (12) simplifies to 

2 

Q(X,a2l„,7) = -^^I„-Ex(7) (14) 

1 + a^7 

where 7 plays the role of the estimation snr. 
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A. A Single Crossing Point 

Motivated by the "single crossing property" of /(X, 7) presented in [6, Prp. 16], an immediate question that 
comes to mind is "does this property extend to the MIMO scenario?" Our hypothesis was that for the setting in 
(13) this property will have a simple extension. Thus, we examine the simplest scalar function of the MMSE matrix 
difference function of (14), that is, we consider some linear combination of it. Accordingly, we define 

qA{X,a^j)^Tr{AQ{X,a%,,j)) (15) 

-Tr(A)-Tr(AEx(7)) (16) 



l + a^j 
where A is a weighting matrix. 

The "single crossing point" property of f(j) extends naturally to the function qA.{X, cr^, 7), for a specific subset 
of matrices A. This result is given in the next theorem. 

Theorem 1: Let A G $" be a positive semidefinite matrix. Then, the function 7 h^ qx{X ^a"^ ,^), defined in 
(16), has no nonnegative-to-negative zero crossings and, at most, a single negative-to-nonnegative zero crossing in 
the range 7 e [0, 00). 

Moreover, assume snro £ [0, cx)) is a negative-to-nonnegative crossing point. Then, 

1) (ZA(X,a2,0)<0. 

2) (7a(^, 0-^,7) is a strictly increasing function in the range 7 e [O,snro). 

3) qaI-'^, 0-^,7) > for all 7 e [snro, 00). 

4) lim^^oogA(^,cr^,7) == 0. 

Proof: We start with the following three lemmas that are instrumental for this proof. 

Lemma 1: Let A e S" be a positive semidefinite matrix and let the random vector X E R" be arbitrarily 

distributed. Then, we can always find a random vector X E R" such that the number of nonnegative-to-negative 

and negative-to-nonnegative zero crossings of qA.{X , a-"^ , j) is the same as those of qi^{X,a'^,'j). 

Proof: See Appendix Al. ■ 

Lemma 2: Let X e R" be a random vector such that Tr(Rx)/« < cr^. Then, for every 7 > 0, we have 

Tr(Ex(7)) ^ _^ ^^^ 

n 1 + (7^7 

with equality if and only if X is a Gaussian vector with i.i.d. elements of variance a^. 

Proof: See Appendix A2. ■ 

Lemma 3: Let A G R"^" be a square matrix. The derivative of the function q^iX, 0-^,7) with respect to 7 is 

given by 

D^gA(X,a2,7) = Tr(AE{*xW})-^j^^^Tr(A). (18) 

Proof: See Appendix A3. ■ 

With these three lemmas at hand, we are now ready to continue with the proof of Theorem 1. 
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Since we are assuming that the matrix A is positive semidefinite and the distribution of X is arbitrary, from 
Lemma 1, we see that we can restrict our study of qA{X, 0-^,7) to that of qi^ (X, cr^, 7). For the sake of simplicity, 
throughout this proof we will use (7(0-^,7) ~ gi^(X, 0-^,7). 

Now, according to Lemma 2, for the case where Tr{'Rx)/n < a^ , the function g((T^,7) has no zeros and the 
statement in Theorem 1 is true. In addition, if X is Gaussian distributed with covariance matrix equal to (T^I„, 
then g(cr^,7) = 0, V7, which also fulfills Theorem 1. 

Thus, from this point, we can assume that Tr(Rx)/?^ > cr^ and that X is not a Gaussian vector with covariance 
matrix cr^I„. Now, for 7 = we have qi.cP' ., 0) == cr^ — Tr(Rx)/'T^ < as required. 

From the smoothness of qicP' , 7) as a function of 7, and done in [6, Prp. 16], in order to prove that no nonnegative- 
to-negative and at most one negative-to-nonnegative zero crossings of <7(cr^,7) can occur, we only need to show 
that the derivative of (7(0-^,7) is positive for all values of 7 for which qicP' ^^') < 0. Observe that q{<T'^,j) < 
implies that 

<Tr(Ex(7))-Tr(E{*x(l^)}). (19) 




1 + 0-2^ 
Now, particularizing Lemma 3 for A = I„, we have that 

D,q{a'rf) = Tr(E{*xW^})-n^^^^ (20) 

> Tr(E{*.(rn)-iI^m^(^:M! (21) 

^ l^E {^xjY) o ^xjY)} 1 - 1^ '^^'^^ (^x(l-))}E{diag (^.(F))^} ^ ^^^^ 

n 
- U.iY) o *x W - diag(^^(y))diag(^^(l-))n 1 ^^3^ 

(24) 

where (21) follows directly from (19); in (22) we have defined 1 as the column vector whose entries are all ones, 
and we used o to denote the Schur product; and (23) follows from Jensen's inequality; finally, (24) follows from 
[14, Prp. H.9]. 

Observe that the inequality in (24), which holds for values of 7 such that q{(T'^,j) < 0, also proves the second 
item in Theorem 1 and the third one follows directly from the inexistence of nonnegative-to-negative zero crossings. 
Furthermore, regarding the fourth item, it is clear that lim-.y^oo q{<^'^, 7) — 0, as both terms in (/(cr^, 7) tend to zero. 

■ 

Remark 1: Note that the above theorem also holds for the normalized function, -(/(cr^ --y). Specifically, for the 
case of A = I„, this is simply the difference between the MMSE of a general Gaussian random variable, with 
variance a"^, and the average MMSE of the n elements of the random vector X. 

Remark 2: For negative semidefinite A it can easily be seen from the proof of Lemma 1 that q^{X , d^ i^) has 
the inverse properties, since it is a mirroring of some (7i^(X,ct2,7) over the x-axis. This is to say, that it has at 
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most a single positive-to-nonpositive zero crossing and, if such crossing exists, qp^{X , cr"^ , ^) will be nonnegative 
at 7 = 0, strictly decreasing up to the crossing, nonpositive after the crossing, and will tend to zero as 7 ^ oo. 

Remark 3: For indefinite A, "single crossing poinf properties, such as those shown in Theorem 1, do not hold 
in general. 

B. Application: A Proof of a Special Case of Shannon 's Vector EPI 

We now show that Theorem 1 can be used to prove a special case of Shannon's EPI [1, Th. 17.7.3], similarly as 
it was done in [6] for the scalar case. Precisely, we will show that 

exp (-h{X + N)] > exp (-h{X)] +27re|Riv|" (25) 

for any independent n-dimensional vectors X and N as long as the differential entropy of X is well-defined and 
N is Gaussian distributed with a positive definite covariance matrix Rjv- 

We define Z to be an n-dimensional Gaussian vector with covariance Hz = Rat and independent of both X 
and N. Thus, without making any assumptions on the covariance matrix of X, we can find an a G [0, 00) such 
that the following equality holds: 

h{X) = h (aZ) = ^log ((27re)"a2n|R^|) . (26) 

Since Rjv is positive definite there exists an invertible matrix V such that Rjv = VV^. Defining X = V^^X, 
Z = V^^Z and N = V^^N we have the following chain of equalities: 

A/(snr) = / (aZ; ^/smaZ + N) - I (X; ^/snrX + TV) (27) 

= I (aZ;y^raZ + N] - I (x;y/smX + n] (28) 

= h (y/smaZ + n) - h f VsnrX + n) (29) 

1 r"' 

= - / (mmse(Q:Z,7) — mmse(X,7))d7 (30) 

2 Jo 

= iyTr(E„^(7)-E^(7))d7 (31) 




snr 



= ^J gi„(X,a2,7)d7 (32) 

where we have used the mmse function defined in (7) and the integral expression for the entropy function in [7]. 
Now, from (29) together with (26), it follows that 

lim A/(snr) = (33) 

snr— ^00 

which, from the integral expression in (32), further implies that the (smooth) integrand must have, at least, one 
zero crossing. However, from Theorem 1, we know that qi^{X,a^,'j) can have, at most, one zero crossing. 
Consequently, in this case, qj^ (X, a^, 7) must have exactly one zero crossing. Also, from Theorem 1 and (33), we 
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can infer that there exists some snrg G (0,oo) such that qi^{X,a'^,j) < 0, V7 e [0,snro), and <7i^(X, 0:^,7) > 0, 
V7 e [snro,oo). Thus, it immediately follows that for finite snr, A/(snr) < and 

A/(snr) = / {aZ; y/smaZ + N) - I {X; yinrX + N) (34) 

= h {y/smaZ + N) -h (VsnrX + AT) < 0. (35) 

It is now straightforward to see that 

exp ( -h {y/smX + N)] > exp ( -h (^/snraZ + N)] (36) 



n 

1 



exp -log ((2^e)"(snra2 ^ i)"|R^|) (37) 



(27re)(snra2^1)|Rjv|" (38) 

exp ( -/i(\/snraZ) ) +(27re)|RAr|" (39) 



= exp(-/i(VsnrX) j +(27re)|Riv|" (40) 

which is exactly (25) up to scaling in ^/iru', which we can always take equal to 1. We note here that the I- 
MMSE relationship was used in [19] to prove Shannon's EPI, Costa's EPI and also the generalized EPI for linear 
transformations of a random vector. 

III. From Scalar to Vector Channels: Definitions and Preliminaries 

In the previous section, we discussed the simple model presented in (13). We have shown that the "single crossing 
point" property initially proved for the scalar channel in [6] extends very smoothly and intuitively on to this model. 
The reason for the smooth transition is that, even though we are considering a multivariate scenario, all elements 
of the input vector undergo the same effect in the channel. They are all amplified by snr and distorted by additive 
standard Gaussian noise. From a more technical viewpoint, when one wants to search for a "single crossing point" 
property, one must define some scalar function of some scalar parameter, for which the property holds. In the model 
of (13) the intuitive choice is simply to take the trace of the MMSE as a function of snr. And indeed, this is just one 
possible linear combination included in Theorem 1, for which we have shown that the property can be extended. 

Taking the next step, from this initial extension to the general model of (5), is a harder task. Moreover, there is 
no single method of doing so. In fact there are two degrees of freedom in this transition. First of all there is a need 
for some scalar parameter that will define H. This parameter will be equivalent to the snr parameter in the scalar 
case or the simple model of (13). Secondly, there is a need for some scalar function of the matrix Q. In the simple 
model of (13) we defined the function qA^{X , <t'^ , j) which was simply taking some linear (positive semidefinite) 
combination of the elements of the matrix. The trace function is one example of such a combination, which is 
also the most intuitive extension; however, in the general model (or even the parallel model, which we will discuss 
shortly) the "single crossing point" property does not hold, in general, for the trace function. Thus, our goal is to 
find a "single crossing point" property that will be both elegant and, more importantly, useful and applicable. 
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10 

As such, in this work we narrowed our investigation to the subset of parallel channels or diagonal matrices H, 
for which we have the following result. 

Lemma 4: For any two diagonal channel matrices, Hi and H2, such that < Hi < H2, there exists a path 
H(f) such that the following holds: 

• For all t, H(i) > and is a diagonal matrix. 

• For all t, DtH(t) > and is a diagonal matrix. 
. H(0) = 0. 

. H(ti) = Hi and H(f2) = H2 where < ti < t2- 

• The diagonal elements of H(t) go to 00 in a linear rate. 

Proof: We need to define a function, gi{t), for each diagonal element of the matrix H(i). It suffices to choose 
any non-negative function hi{t) such that the area from to ti will equal [HiJ^i and the area from ti to t2 will 
equal \R2\ii — \ii-i\ii- Given that, we can set the function to be gi{t) — J^ hi{T) dr. The entire path, H(t), will be 
given by: 

H(t)=Diag({g,(t)}). (41) 

As required, this path passes between the zero matrix at f = 0, Hi at ti and H2 at t2- Since hi{t) are chosen 
nonnegative for all i we have a nonnegative and monotonically nondecreasing path for all t. The above construction 
guarantees that both H(t) and DjH(i) will be diagonal matrices for all t. Moreover, we may also assume that the 
functions hi{t) plateau after complying with all other requirements, that is, from t2 onwards. This assures that gi(t) 
goes to 00 in a linear rate. ■ 

Note that the above lemma can be extended to M matrices Hj ^ Hj+i for j = 1, ...,M — 1, using a similar 
construction. 

Under the above detailed limitation, of restricting ourselves to parallel channels, we examine two different 
cases: phases two and three of our extension. In phase two, detailed in Section IV, we assume that the Gaussian 
covariance matrix defining the matrix Q in (14) is that of a Gaussian distribution with independent elements, that 
is Rxg = ^Xa is ^ diagonal matrix. In this case we will see that the "single crossing point" property occurs 
for each and every diagonal element of Q. This is not a straightforward extension of the scalar property, since the 
elements of the random input vector X are, in general, not independent. In Section V we proceed to phase three 
where we allow any Gaussian distribution in the definition of Q. In this phase we will see that the "single crossing 
point" property occurs for each and every eigenvalue of the matrix Q. Surely, this is not a straightforward extension 
of any of the previous results. Moreover, the results of Section IV cannot be trivially deduced from the results of 
phase three, since restricting only the Gaussian covariance to be diagonal does not guarantee that the eigenvalues 
of Q will be on its diagonal. Thus, we have two distinctive results. All results (including those of the previous 
section), fall back to the scalar "single crossing point" property result [6], [20] when both the arbitrary input vector 
X and the Gaussian input random vector are restricted to have independent elements. 

Before proceeding to examine these two cases we require a preliminary result. The basis for the applicability of 

March 1, 2013 DRAFT 



the "single crossing point" property in the scalar case and in the simple model of (5) is the I-MMSE relationship 
[7]. This is still the case in the extensions we are considering next, however, we require also an extension of the 
I-MMSE result which was derived by Palomar and Verdii in [17]: 

Vh/ {X; HX + N) = HE. (42) 

This relationship was derived for complex-valued variables, however it holds verbatim for real-valued variables. 
Assuming the channel coefficients can be written as a function of a single parameter, t, we can rewrite the above 
relationship as an integral over this parameter, which results with the following expression: 

/ (X; Y{t)) = I {X; H{t)X + N) 

= 1 l"^(H(T)Ex(r)o D^H(T))ldT 

Jt=0 

= y" Tr ((H(r)Ex (r))^ D,H(t)) dr (43) 

= / Tr(B(T)Ex(T))dT (44) 

Jt=0 

where we have used the following definition: 

B(t) = H(i) (Dtli(t)f . (45) 

This also carries over to the conditioned case as follows: 

/ (X; Yit)\U) = I (X; H(t)X + N\U) 

^ J Tr((H(r)Ex|c/(r))^D,H(T))dT (46) 

= / Tr (B(T)Ex|t/(T)) dr. (47) 

Jt=0 

IV. Vector Channel: Comparing with an Independent Gaussian Distribution 
We begin our analysis of the extended model (5), limited to parallel channel matrices, by assuming that the 
Gaussian covariance matrix, defining the matrix Q, is that of an independent distribution, that is, Rxg = ^Xa^ 
throughout this section. Recall, nonetheless, that X remains completely arbitrary. More precisely, we consider the 
following matrix: 

Q(X,Ax«,t) = EG(Ax«,t)-Ex(t) (48) 

Under these assumptions we will see, in Section IV-A, that a "single crossing point" property occurs for each and 
every diagonal element of the matrix Q. After extending this result to the conditioned case, in Section IV-B, we 
will use the I-MMSE relationship, in Section IV-C, to show the effect of this property on information-theoretic 
quantities, and more specifically on the mutual information. Finally, in Section IV-D, we will put these results to 
use on a variant of the degraded BC, in order to show their applicability to information theory problems. 

March 1, 2013 DRAFT 



12 

A. A Single Crossing Point Property on the Diagonal Elements of Q 

As pointed out above, our main result, in this section, is an extension of the "single crossing point" property. 
Precisely, we show that the property extends on each and every diagonal element of the matrix Q. This result is 
given in the next theorem. 

Theorem 2: The diagonal entries of the matrix-valued function t h^ Q(X, Axg,*), defined in (48), have no 
nonnegative-to-negative zero crossings and, at most, a single negative-to-nonnegative zero crossing in the range 
t e [0,oo). Moreover, let Iq e [0,cx)) be the negative-to-nonnegative crossing point for [Q(X, Axg)0]"- Then, 

1) [Q{X,AxaM^^<^■ 

2) [Q(X, AxG,i)]ii is a strictly increasing function in the range t G [0,to)- 

3) [q,{X,Axa,t)h > for all t e [snro,oo). 

4) Assuming limt^oo[H(i)]ji = oo, we have that \imt^c<,[Cl{X,AxG^'t)]ii = 0- 

5) [Q{X,Axayt)]ii is a continuous and monotonically increasing function in [Axdii- 
Proof: Before giving the actual proof, let us first present an intermediate result. 

Lemma 5: Let X e R" be a random vector such that [Rx]ii < [Axdii' where i G [l,n]. Then, for every 
i > 0, we have 

[^x{t)]u < {EGiAx^,t)]u = , ^ if.^f }'' 1 (50) 

i + [t±[t)\^^[Axa\ii 

with equality if and only if [X]i is Gaussian distributed, independent of the other entries of X and such that 

[R-x]ii ~ [Axclii- 

Proof: See Appendix A4. ■ 

Now, according to Lemma 5, for the case where [Rx]ii < [Axdii' the function [Q{X, Axgj 01" has no zeros 

and the statement in Theorem 1 is true. In addition, if [X]i is Gaussian distributed (and independent of the other 

entries of the vector X) with variance equal to [Rx]ii = [Axg]^, then [Q{X,Axa jt)]ii — 0, Vt, which also 

fulfills Theorem 1. 

Thus, from this point, we can assume that [Rx]ii > [Axdii and that [X]i is not: Gaussian distributed, 

independent of the other entries of X, and with [Rx]ii = [Axg]m- Now, for t = we have [Q(X, Ax^, 0)]^^ = 

[Axclri - [Rx]ji < as required. 
Similarly as it was done in the proof of Theorem 1, in order to prove that no nonnegative-to-negative and at 

most one negative-to-nonnegative zero crossings of [Q(X, AxG,i)]ii can occur, we only need to show that the 

derivative of [Q(X, AxgjO]" with respect to t is positive for all values of t for which [Q(X, AxG,i)]ii < 0. 

Observe that [Q(X, Ax^, t)]ii < implies 

[Axclii 



l + [Uit)]l[Axah 



= [I]G{Axa,t)h<[^x(t)]u- (51) 
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Now, from (48), it is clear that, in order to compute the derivative of [Q(X, Ax^ , i)]ii, we first need the derivative 
of [Exit)]u: 

Dt[t^xit)h = DH(t)[Ex(t)],.D,H(t) (52) 

n 

= E D[H(t)],,[Ex(i)]..D,[H(i)]„- (53) 

where, in the last step, we have used the assumption that H(t) is a diagonal matrix for all t. From [14, Eq. (131)], 
we have 

D[H(t)],jEx(t)].. = -2E{[^x{YM'^x{Y)H{t)%} (54) 

= -2[U{t)],,E{[^x{Y)%}. (55) 

Recalling the definition [B(t)]ii — ['H.{t)]iiDt[ii{t)]ii in (45), we are now ready to compute the derivative of 
[Q(X, AxG,t)]ii, which reads as 

Dt[Q{X,Axa,t)h (56) 

n 

= 2j2[m)]n (E{[*xW]y - [EaiAxaM,) (57) 

= 2[B{t)]u {E{[^x{Y)]l} [EG{Axa,t)]l) + 2^[B(t)kE{[*x(l^)]y (58) 

> 2[B{t)U (E{[*x(F)]|} - [EG(Ax«,i)]l) (59) 

> 2[Bit)U {E{[^xiY)]i} {E{[^xiY)]u}f) (60) 

> (61) 

where (57) follows from the fact that for Gaussian input distributions (not necessarily i.i.d. ), the conditional MMSE 
matrix ^Xoiv) does not depend on the observation y, i.e., E(3(Rxg:^) — ^Xa- Equation (58) is due to the fact 
that the entries of the Gaussian input distribution Xa are independent and, thus, its MMSE matrix is diagonal; (59) 
is due to the fact that [B(i)]ii > 0, as shown in Lemma 4; (60) follows from the assumption [Q(X, Axg, t)]ii < 
and (61) can be derived from Jensen's inequality. 

Observe that the inequality in (61), which holds for values of t such that [Q(X, AxgjOI" < 0' ^l^o proves 
the second item in Theorem 2 and the third one follows directly from the inexistence of nonnegative-to-negative 
zero crossings. Regarding the fourth item, it is clear that lim(^oo[Q(^, AxG;,i)]ii = 0, as both terms in the 
expression of [Q(X, AxGji)]^ in (49) tend to zero, when limf^oo[H(i)]ii = oo. Finally, the last property is a 
direct consequence of the definition of the function Q(X, Ax^ 1 1) (49). ■ 

We now define the following function: 

d,{X,Axa,t) = [B{t)UQ{X,Axa,t)h (62) 

and also, 

n 

d{X,Axa,t) = J2d,{X,Axa,t) (63) 
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Fig. 1. An example of the function [Q(.X", Ajcq, t)]ii (in red) and the matching function di(X, Ax^i*) ('^^ blue). Both have the same 



single negative-to-nonnegative zero crossing in the range t £ (0, oo). 



For which we can give the following two corollaries. 

Corollary 1: Let X. E R" be any random vector. The function di{X,Axa^t) has the following properties: 

1) d,(X,Ax«,0) = 0. 

2) It has at most a single negative-zero-positive crossing in the range t E (0, oo). 

3) When limf^oo[H(f)]ii = oo we have that, limt_j.oo <ii{X, Axg, t) — 0. 

4) If [Axclii = [R-xlrJ, then di{X,Axajt) ^ for all t. Furthermore, di(X,AxG;,i) is a continuous and 
monotonically increasing function in [Axdii- 

Proof: The first three properties follow from Theorem 2 and the fact that [B(i)]ii is zero at i = 0, non-negative 
for all other values of t G (0, oo) and [B{t)]ii goes to oo in a linear rate, as shown in Lemma 4. The fourth property 
is a direct result of Lemma 5 and the fifth item of Theorem 2. ■ 

Figure 1 illustrates this property, in which the negative-zero-positive crossing of di{X, Axa -, t) is simply a negative- 
to-nonnegative zero crossing and, thus, agrees with the negative-to-nonegative zero crossing of [Q(X, AxG,i)]ii- 
Corollary 2: Let X E R" be any random vector. The function d(X, Axg,^) is either negative for all t, or 
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there exists t' E [0,oo) such that for all t > t' the function d(X,AxG,^) is nonnegative. Moreover, when 

limt^oo[H(i)]ii = oo we have that, linit^oo d(X, Axg, — 0' ^^^d if [■'^XGlii — [R-x]ii for all i, then 
d(X,AxG,t) > for all t. 

B. The Conditioned Case 

Before proceeding to understanding the implications of the above results on information-theoretic quantities, we 
would like to extend these results to the conditioned case. 

Let us begin with the conditioned MMSE matrix. We first consider the following matrix quantity: 

Ex|i7(i,M) = ^{{X - E{X \^{t)X + N ,U = u}){X - E{X \\i{i)X + N ,U = u]y \U = u] (64) 
= E{(X„ - E{X„ I H(i)X„ + Ar})(X„ - E{X„ I H(t)X„ + N})"] (65) 

where X„ is a random vector distributed according to Px\u=u- The conditioned MMSE matrix is simply the 
expectation of (64) according to the distribution of the random vector U: 

Ex|t/(t) = E {Ex|f/(t, U)} = E{(X - E {X I H(t)X + AT, C/})(X - E {X | H(t)X + AT, U})^}. (66) 

Another important quantity that needs to be extended to the conditioned case is: 

*x„(y) - E{(X„ - E{X„ I y})(X„ - E{X„ |y})T|y} (67) 

= E{iX - E{X \y,U = u}){X - E{X \y,U = uj^ly, U = u} (68) 

= *x(y,[/ = tx) (69) 

where, as in the unconditioned case, this function, in general, depends on both u and y, thus, we have Exn/(i, u) = 
E{$x(^5 t^ = m)}, where the expectation is over Y. However, when the input distribution of X„ is Gaussian 
*x(y? U — u) is independent of y. In a similar manner, we have the following: 

Q(X|C/ = M,AxG,i)=EG(AxG,t)-Ex|[/(i,M) (70) 

and, thus, we also have: 

Q{X\U,Axa,t) = Eu{QiX\U = u,Axa,t)} - EG(AxG,i) - Ex\u(t). (71) 

Using these definitions we can now extend the results of Theorem 2 to the conditioned case in the following 
theorem. 

Theorem 3: Let U X Y form a Markov chain. Then, the diagonal entries of the matrix-valued function 
t K^ Q(X|[/, Axgi 0' defined in (71), have no nonnegative-to-negative zero crossings and, at most, a single 
negative-to-nonnegative zero crossing in the range t e [0, oo). Moreover, let tg G [0, oo) be the negative-to- 
nonnegative crossing point for [Q(X|C/, AxG,i)]ii- Then, 

1) [Q{X\U,Axa,0)]u<0. 

2) [Q(X|[7, Axgi t)]ii is a strictly increasing function in the range t E [0, to). 
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3) [Cl{X\U,Axa,t)U > for all te [snro,oo). 

4) When limi^oo[H(t)]ii = oo we have that \imt^c<,[Q{X\U,Axa,t)]ii ^ 0. 

5) [Q{X\U,Axajt)]ii is a continuous and monotonically increasing function in [Axdii- 

Proof: If [X]i is Gaussian distributed (independent of U and independent of the other entries of the vector 
X) with variance equal to [Rx]ii — [■^Xa]ii' '^hen [Q{X\U,Axa:t)]ii — 0, Vt, which also fulfills Theorem 3. 
Thus, from this point, we can assume that [X]i is not "Gaussian distributed, independent of U and independent of 
the other entries of X, and such that [Rx]ii — [AxaW- 

In this conditioned case, it is harder to determine, up front, all cases in which the function [Q(X|C/, AxcOl" 
has no zeros. Thus, contrary to the approach used in the proof of Theorem 2, we first prove that no nonnegative- 
to-negative and at most one negative-to-nonnegative zero crossings of [Q(X|C/, AxGji)]^ can occur. The first 
property is a direct consequence of this, and there is no need to determine the exact conditions under which the 
function has no zeros. This approach could have also been used in proving Theorem 2, however in the unconditioned 
case we can easily determine the set of cases in which [Q(X, Axg,^)]^ has no zeros. 

Similarly to the proof of Theorem 2, in order to prove that no nonnegative-to-negative and at most one negative- 
to-nonnegative zero crossings of [Q(X|[/, Axgi*)]** can occur, we only need to show that the derivative of 
[Cl{X\U,Axajt)]ii with respect to t is positive for all values of i for which [Q{X\U, Axa:'t)]ii < 0- According 
to equations (57) and (59) we have the following lower bound: 

n 

Dt[Q{X\U = u,Axa,t)]u = 2j2[m)]n Hi^xAY)]^} - [EoiAxaM,] (72) 

> 2[Bit)U {^{[^xjY)]l} - [EciAxaMi] ■ (73) 

Now we can take expectation over U on both sides and attain the following: 

Dt[Q{X\U,Axa,t)U - Eu{Dt[QiX\U,Axa,t)h} (74) 

> Eu{2mt)U {E{[^xjY)]l} - [EG(Ax«,t)]|)} (75) 
= 2[B{t)]u (E{[*x(y, [/)]?,} - [EG(Ax,,t)]?,) (76) 

> 2[Bit)]u {e{[^x{Y,U)]1} {E{[^x{Y,U)]u}f) (77) 

> (78) 

where (75) is due to (73), (77) follows from the assumption [Q{X\U, Axg, 01" < and (78) can be derived from 
Jensen's inequality. 

Observe that the inequality in (77), which holds for values of t such that [Q(X|C/, Axci)]^ < 0, also proves 
the second item in Theorem 3 and the third one follows directly from the inexistence of nonnegative-to-negative 
zero crossings. Regarding the fourth item, it is clear that \iint^oo[Q{X\U , Axa 7t)]ii == 0' ^s both terms in the 
expression of [Q(X|C/, Axci)]^ in (71) tend to zero, when limt^co[H(i)]ii = oo. Finally, the last property is a 
direct consequence of the definition of the function Q{X\U, Axa i *) i" (71). ■ 

March 1, 2013 DRAFT 



We now extend the definition of the function di{X,Axc'^) (^2) and the function d(X, Ax^ji) (63) to the 
conditioned case: 

d,{X\U = u, Axa,t) = [B{t)UQ{X\U = u, Axa,t)]u (79) 

d,{X\U,Axa,t) = [B{t)UQ{X\U,Axa,t)h (80) 

and also, 

n 
d{X\U = U,Axa,t) = ^d,(X|C/ = U,Axa,t) (81) 

1=1 

n 

d(X|C/,AxG,0 = E^«(^l^'^^«,0 (82) 

1=1 

For which we can extend corollaries 1 and 2 as follows, 

Corollary 3: Let U X Y form a Markov chain such that the random vector X\U = u E R" has covariance 
matrix Rx\u=u- The function di{X\U,Axa:t) has the following properties: 

1) d,iX\U,Axa,0)^0 

2) It has at most a single negative-zero-positive crossing in the range t G (0, oo). 

3) When limj^oo[H(t)]ii = cx) we have that, limf^oo di(X|C/, Axc^) = 0. 

4) If [Axa]ii — [R-x]ii, then di(X|C/, Axgj^) ^ for all t. Furthermore, di(X\U,Axait) is a continuous 
and monotonically increasing function in [Axclii- 

Proof: The first three properties follow directly from Theorem 3 and the fact that [B(i)]ii is zero at f = and 
nonnegative for all other values of i e (0, oo) and [B(i)]ii goes to oo in a linear rate, as shown in Lemma 4. The 
fourth property is a direct result of Lemma 5, and the fifth property in Theorem 3. ■ 

Corollary 4: Let U X Y form a Markov chain. The function d(X|[/, AxcO i^ either negative for all 
t, or there exists t' E [0,oo) such that for all t > t' the function d(X\U,Axait) is nonnegative. Moreover, 
when limj^oo[H(i)]ii = oo we have that, limj^oo di{X\U, Axgi t) = 0, and if [Axdji = [R.x]ii for all i, then 
d{X\U,Axa,t) >0 for alii. 

C Properties of the Mutual Information 

So far, we have seen properties of the matrix Q(X, Axc^) or, more precisely, of its diagonal elements. We 
have seen that these properties extend naturally to the conditioned case, and also to the function di(X,AxGjO 
and its conditioned version. In this section, our goal is to use these results to derive new properties on the mutual 
information between the input and the output of parallel Gaussian channels. In order to derive these results we put 
to use the I-MMSE relationship, as given in equations (43)-(44) and (46)-(47). 

For the sake of compactness we will write the properties in this section only for the more general, conditioned 
case, from which one can easily derive the respective unconditioned theorems. 
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Theorem 4: Let U X Y form a Markov chain. Assume an independent Gaussian input, Xa, with covariance 
Axg^ such that for all i, 



where 



/ ([X],; [Y{U)UU) = HiXch; [Ycite]] 

Y{te)=li{te)X + N and 
Yg{U) = H(te)XG + N. 



(83) 

(84) 
(85) 



Then d(X|C7, Ax^,*) > for all t > t^. 

Proof: Let us define X™'^ '^°"'' e R" as a random vector with independent elements when conditioned on U, 
and with distribution of each pair ([X'"'' ^°"'^']i, U) being the same as the marginal distribution of the corresponding 
pair {[X]„ U). Thus, [E-y,„dc„„d.|f^j is basically the MMSE of [X'"'*"™''], from U and [Y{t)]i, which is: 

[Y{t)l - [H(i)X'"'' ™"d- + AT] ^ = [n(t%^ [X'"d '°'% + [N], (86) 

where the equality holds due to the fact that the channel matrix H(t) is diagonal for all t and N is standard 
Gaussian. Using these definitions we can give the following special case of (47): 



/([X],;[r(f)],|C/)=/([X],;[H(i)]^JX" 



[H(r)] 



r=0 
t 



J-J vind cynd. i 



^(r)ljD.H(T)],,dT 



[B(r)], 



r=0 



J_J Y'ind [:ond. i 



uir) 



dr. 



Putting this together with the assumption, we have, 

= I{[XgV\ [YG{te)V) - I ([X],- [Y{U)],\U) = 



d.(X" 



|[/,AxG,r)dr. 



(87) 
(88) 

(89) 
(90) 



T = 



Now, due to Corollary 4 we can conclude that there exists a to G [0,te] such that di(X'"'' ''°"''|C/, Axg, t) > for 
all t > to and as a result, dj(X'"'*'=™''|C/, Axq, t) > for all t > t^. Now, for all t we have that [Ex\u{t)h < 
[Eji^indcond.|fj(i)]ii. Thus, if the negative-zero-positive crossing of di{X^"^ '^°"'^-\U, Axait) is at to, the negative-zero- 
positive crossing of di(X|t/, Axc^) is at a tp < tg- From this we can conclude that also di(X|C/, Axgi*) ^ 
for all t > ig- Finally, since this holds for every i, it also holds for the summation over i, i.e., for the function 
d(X|[/, Axgi 0' concluding the proof. ■ 

We are now ready to give the main theorem of this section. 

Theorem 5: Let U X Y form a Markov chain. For any t^ G [0, oo), there exists an independent Gaussian 
input, Xg, with covariance Ax^ such that the following properties hold: 

1) d{X\U,Axa,t) > for all t > t^. 

2) I{X;Y{te)\U) = /(Xg; YG(te)), where Y{te) and YG{te) are as defined in (84) and (85) respectively. 

3) [AxgU < [^x]ii for all i. 
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Proof: We provide a constructive proof, and show how one can build an independent Gaussian input distribution 
complying with all three requirements. We begin by examining the meaning of the second requirement. First, recall 
the I-MMSE relationship in the parallel setting, given in equation (47), 

/ (X; Y{t,)\U) = f° Tr (B(r)Ex|f/(T)) dr. (91) 

Jr=0 

Now, the second requirement is equivalent to the following equality, 

= I iXG;YG{te)) - IiX;Y{te)\U) = f Tr(B(T)Q(X|C/,AxG,T))dT (92) 

Jt=0 

= / Vd,(X|C7,Ax,,r)dT 

Jr=0 ,^1 

= V/ d,{X\U,Axa,r)dT. 

,^1 Jt=Q 

Thus, we wish to show the existence of an independent Gaussian input distribution which complies with requirements 
1, 3 and (92). There are different ways to attain equality in (92), however since we need only to show the existence 
of a specific independent Gaussian distribution, we follow one possible approach, which is to require the following, 

f d,{X\U,Axa,r)dT = 0, -ii. (93) 

Jt=Q 

Now, according to the fourth property in Corollary 3 we know that, 

d,(X|[/,Ax«,i)>0 (94) 

for all t, when [Axa\ii — [R-x]ii, and that it is continuous and monotonically increasing in the value of [Axa\ii 
(and trivially negative, for all t, when [Axa\ii = 0)- Thus, there exists a number rji E [0, 1] such that setting 
[Axclii — Vi[^x]ii rcsults with the equality in (93). Due to the second property in Corollary 3, we know that 
either di{X\U, Axq, i) = for all t or that there exists a single negative-zero-positive crossing in the range [0, ig]. 
In both cases the setting [Axdii ~ 'rii[^x]ii results with di(X|C/, Axg:*) > for all t > ig. Since there exists 
such an r/i for every i we comply also with requirements 1 and 3, and conclude the proof. ■ 

Remark 4: Note that the above choice of Axa does not necessarily imply Axa ^ R-x- However, we can 
conclude that Ax^ ^ R-x- 

The following is a simple corollary of the above theorem. 

Corollary 5: Given any arbitrary independent input distribution over X E R", with covariance Ax, and any t^, 
there exists an independent Gaussian input, Xq, with covariance Ax^ such that 

/ (X; H(te)X + N)=I {Xg; H(ie)XG + AT) (95) 

Axa ^ Ax (96) 

and EG(AxG,ie)^Ex(te) (97) 
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D. Application: The Degraded Parallel Gaussian BC Capacity Region under Per-antenna Power Constraint 

We now show that Theorem 5 can be used in providing a converse proof for the degraded parallel Gaussian BC 
capacity region under a per-antenna power constraint. We consider the following model, 

Yi[m] = HiXH +7Vi[m] 

r 2 H = H2 X H + A^2 N (98) 

where Ni[m] and N2[m] are standard additive Gaussian noise vectors independent for different time indices m, 
and Hi and H2 are diagonal positive semidefinite matrices such that Hi < H2. X E R" is the random input 
vector, and it is assumed independent for different time indices m. Note that m is the time index and should not 
be confused with the scalar parameter t which is used as a "MIMO snr parameter", i.e., the parameter t determines 
the channel matrix H(i). 

We consider a per-antenna power constraint: 

[E{XX^}]^^<P, Vz,l<z<n. (99) 

Since we have a degraded BC, we can use the single-letter expression given in [21], 

Ri</([/;Yi) 

R2<I{X;Y2\U) (100) 

where U is an auxiliary random vector over a certain alphabet that satisfies the Markov relation U — X — {Yi , 1^2 ) ■ 
The following proof was originally given for the scalar Gaussian BC in [6], [20] and we now extend it to the degraded 
parallel Gaussian channel. Using Lemma 4 we can construct a path such that: 

H(t2) = H2 
H(ti) = Hi 
H(0) = O (101) 

where < ti < t2 and H(t) is diagonal for all t e [0,^2]- 

Now, assume a pair ([/, X) such that X has covariance Rx- According to Theorem 5, there exists an independent 
Gaussian vector, Xq, with covariance matrix Axa such that the following properties hold: 

/ (X; Yi\U) = I (X; U{h)X + N\U) = I {Xg; Ib(ti)) = / {Xg; H(ti)XG + N) (102) 

d{X\U,Axa,t)>0, yt>ti (103) 

[AxaU<[^xh Vl. (104) 
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(105) 



Using the I-MMSE relationship (47) we can write, 

IiXG;Ilit)XG + N)-IiX;U{t)X + N\U)= f Tr (B(t)Q(X|C/, Axg,t)) dr 

Jt=0 
nt n 

= / ^d,(X|C/,Axc,r)dr (106) 

= / d(X\U,Axa,T)dT. (107) 

Using the above properties on (107) we have that for any t' > ti, 

I{XG;ii{t')XG + N)-I{X;U{t')X + N\U)= /" ' d(X|C/, Ax«, r) dr + / d(X|[/, Ax«, r) dr 

(108) 

= 0+/ d(X|C/,AxG,T)dT>0 (109) 

JT=ti 

where the second transition is due to (102) and the inequality is due to (103). Thus, we have shown the existence 
of an independent Gaussian vector, Xq, with covariance matrix Axq, with the following properties: 

/ (X; HiX + N\U) = ^log|I + HiAx^H^I (110) 

/(X;H2X + Ar|C/)<i|og|I + H2AxoHT| (111) 

and [Ax«]^. < [Rxlri Vi. (112) 
Using these properties on the single-letter expression (100) we obtain the following outer bound: 
Ri < / (C/; Yi) = / (X; Y^) - I (X; Y^\U) 

< i|og|I + H,PH|| - ^,og|I + H,Ax.H|| ^ ^log |;;™H7| ''''' 

R2 < / (X; Y^IU) < ^log|I + HaAx^Hjl (114) 

where P is a diagonal matrix with [P]ii = Pi for all i. This outer bound is tight and the achievability is well-known 
using superposition coding. This approach can be extended to the M-user scenario as shown in Appendix B. 

V. Vector Channel: Comparing with a General Gaussian Distribution 

In this section we extend our analysis of the previous section. We continue looking into the model given in (5), 
limited to parallel channel matrices, however we now allow the Gaussian covariance matrix, defining the matrix Q, 
to be any proper covariance matrix. In other words, we no longer limit ourselves to independent Gaussian inputs. 
For this, more general setting, we will see in Section V-A that a "single crossing point" property occurs for each 
and every eigenvalue of the matrix Q. After extending this result to the conditioned case, in Section V-B, we will 
use the I-MMSE relationship, in Section V-C, to show the effect of this property on information-theoretic quantities, 
and more specifically on the mutual information. We will relate these results to the Fisher information in Section 
V-D. Finally, in Sections V-E and V-F we will put these results to use in the degraded BC capacity converse proof, 
for both the compound and non-compound scenarios. 
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A. Single Crossing Point for Each Eigenvalue o/Q(t) 

In this section we prove the main result of this paper: showing that each eigenvalue of the matrix Q has at most a 
single negative-to-nonnegative zero crossing. This is, to our understanding, not an intuitive extension of the "single 
crossing point" property, which emphasizes the importance of the eigenvalues in the analysis of MIMO scenarios. 

For the proof of the main theorem, we require the following lemma, which might also be of interest on its own. 

Lemma 6: The following lower bound holds: 

DtQ(X,Rxc,t)h2(Ex(t)B(i)Elt(t)-EG(t)B(i)ET(t)) (115) 

where B(t) was defined in (45) and assumed a positive semidefinite diagonal matrix for all t (see Lemma 4). 

Proof: See Appendix A5. ■ 

We are now ready to proceed to the main result of the paper: 

Theorem 6: Each eigenvalue of Q(X,RxG;,i) has, at most, a single negative-to-nonnegative zero crossing of 
the horizontal axis. 

Proof: Loosely speaking, the proof is based on proving that, once an eigenvalue has become (or is) nonnegative, 
it cannot become negative. Thus, from the (weak) continuity of the eigenvalues as a function of t, that follows from 
[22, App. D], the eigenvalues can cross the horizontal axis, at most, once. Also from continuity arguments, it is 
easy to see that we must limit our study of the eigenvalues of Q(X, Rxg, t) to the values of t where the matrix 
Q(X, Rxg j t) is singular (i.e., a subset of its eigenvalues are zero) as it is the only possible situation where a zero 
crossing can occur. Finally, throughout this proof and for the sake of simplicity we will use the simplified notation 
Q(X,Rxgj^) = Q{t) = ^oit) ^ Ex(0 because the entire proof is given for any constant setting of the input 
random vector X and the Gaussian co variance Rxg- 

We begin by stating a few supporting results and giving some preliminary definitions. 

Lemma 7: Let A and B be two n-dimensional positive semidefinite matrices, i.e., A ^ 0, B )^ 0. Then, there 
exists an invertible matrix S such that both SAS^ and SBS^ are diagonal matrices. 

Proof: See Appendix A6. ■ 

Let us consider the simultaneous decomposition of {EG(t),Ex(i)} according to Lemma 7 as: 

EgW = V(t)Tl]G(t)V(i) 

(116) 
Ex(t) = V(t)Tl]x(t)V(t) 

where V(t) is an invertible matrix and ^(^(t) and Sx(i) are diagonal matrices. It will be convenient to define 
Q(t, r), for T > 0, according to 

Q(t,T)=V(T)-TQ(t)v(r)-i (117) 

where V(t) is the same as defined in (116). 

The remainder of the proof is split into two parts. In the first part we will prove that each eigenvalue of Q(i, t) 
has at most a single negative-to-nonnegative zero crossing. In the second part, we will show that this property 
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transfers to Q(i), thus completing the proof. Coincidentally, both parts of the proof will be based on contradiction 
arguments, i.e., we assume that the opposite of what we want to prove is true and, then, end up with an inconsistency. 

1) Single crossing point for the eigenvalues ofQ{t,T): Let us start by presenting a result on the differentiability 
of the eigenvalues of a symmetric matrix with respect to some scalar parameter t, which was studied by Rellich in 
[23, Ch. l]!; 

Lemma 8: [23, Th. in p. 57] Suppose that A(t) is an n-dimensional symmetric matrix defined on some open 
interval t G (ti,t2)- Suppose that the derivative DtA(t) exists and it is continuous for each t G (^1,^2)- Then, there 
exist n functions \i{t), i — 1, . . . ,n with continuous derivatives in t G (^1,^2)^ such that 

A(t)u,(i) = A,(t)u,(i), z = l,...,n (118) 

for some properly chosen orthonormal system of vectors Ui{t), i = 1, . . . ,n. 

Since Q(t,T) is a symmetric matrix whose derivative DiQ(t, t) exists. Lemma 8 ensures the existence of n 
continuous and differentiable functions such that they are equal to the eigenvalues of the matrix Q(t, t), for any 
choice of r. These functions will be denoted from now on by Xi(t, t), for i — 1, . . . ,n. 

Now, let us assume that, at t = to, k of these eigenvalues (with k < n) are equal to zero, i.e., Xi{to,T) = 0, 
for i = 1, . . . , fc. Furthermore, we also assume that, from these k eigenvalues that are zero at t = Iq, s of them 
(with s < k) have a nonnegative-to-negative zero crossing at t = to- To sum up, we assume that the differentiable 
functions \i{t, r), with i = 1, . . . ,s have a nonnegative-to-negative zero crossing at t = to- 
Let us now present a property of differentiable functions that contain nonnegative-to-negative zero crossings: 
Lemma 9: Assume that f{t) has a nonnegative-to-negative zero crossing at t ~ to and that f(t) is differentiable 
in a neighborhood of tg. Then, there exists a positive value e such that 

/(i)<0, te{to,to+e), (119) 

Dtf{t)<0, te{to,to+e). (120) 

Proof: From Definition 2, (119) follows immediately for any e < e. The proof for (120) follows easily from 
the mean value theorem and elementary calculus. ■ 

Applying Lemma 9 to the set of functions Xi{t, r), with i = 1, . . . , s, we readily obtain: 

A,(i,T)<0, te{to,to+e,{T)) \ . 

> z = l,...,s (121) 

DtXi{t,T) <0, te {to,to+ei{T)) J 

where we have written ^^(t) to make explicit the dependence of e^ on the specific value of r. For the sake of 
convenience, we want to eliminate the dependence of Si on r. A possible method to eliminate this dependence is 
to define 

s* = inf eAt) = min £,(t) > (122) 

re[to,to+M] re[to,to+M] 

'Rellich studied the eigenvalue differentiability for Hemiitian matrices. We speciaUzed his result for the real case studied in this paper, 
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where, for the sake of convenience, we have restricted the values of t in the interval [to, to + M], with M being 
an arbitrary fixed positive value (observe that since ei(r) can be made arbitrarily small we can always guarantee 
that M > £i(T) > £*), and where the second equality follows from the fact that the optimization set is a closed 
interval and the third one follows from £i(T) > 0, Vt. 

Consequently, after this simplification, we have that, assuming that the differentiable functions Xi(t,T), with 
i = 1, . . . ,s have a nonnegative-to-negative zero crossing at i = to, they must fulfill: 

A,(i,T)<0, teito,to+e*) \ , 

> i^l,...,s. (123) 

DtX^{t,T)<0, te {to,to+e*) J 

Now, we can particularize the expression above for the case where t = to + e*/2 = t* and where we also choose 
T = t* . We obtain 

A,(i*,t*)<0 1 

} i^l,...,s. (124) 

DtA,(t,i*)|,^,, <0 J 

From this point our goal is to prove that the two conditions in (124) cannot both hold at the same time. For 

that purpose, we need an expression for the derivative of the eigenvalue function DtXi{t,t*). Since we have 

that \i{to,T) = for i = 1, . . . ,k (i.e., the multiplicity of the zero eigenvalue is k) we cannot guarantee that 

the multiplicity of the eigenvalue Xi(t*,t*) is equal to one. From this point, we assume that the multiplicity of 

X^{t*,t*) is I. 

Consequently, we now require the following result by Lancaster in [24, Th. 7] (it is also reproduced in [18, Ch. 8, 
Sec. 12, Th. 13]), which gives us an expression for the derivatives of the multiple eigenvalues^: 

Lemma 10: [24, Th. 7] Under the assumptions in Lemma 8, let's consider the case where A(t) has a repeated 
eigenvalue Aq with multiplicity I, i.e.., Ai(t) = A2(t) = ... = A;(t) = Ag. Assume further that the n x / matrix 
U(t) spans the space associated with the repeated eigenvalues (i.e., U(i) contains one particular set of eigenvectors 
associated with the I repeated eigenvalue). Then, the I derivatives of the eigenvalues, which coincide at Aq are the 
eigenvalues of the matrix 

\J{t)'DtA{t)\J{t). (125) 

Using Lemma 10, we can write 

DtX,{t,T^ t*\^^^ = M, {ll^i DtQ(t,r = t* 

= M^ (lLv(t*)-T DtQ(X,Rxc,t)|,^,*V(r)-il„,) (127) 

> M. (px{t*)C{t*)^x{t*) - Y^G{t*)C{t*)^G{t*)]i.,is:) (128) 

= M, ([Sx(t*)]i,,i, [C(t*)]i,,i, [Sxr)]l,,l, - [SG(i*)]l.U:; [C{n]l:lS:l [^G{t*)],.,^■.) 

(129) 

^The assumptions [24, Th. 7] are different tlian those in Lemma 8, but, once existence of the derivatives of the eigenvalues has been established, 
their expression has to be the same. 
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where /ii(A) denotes the eigenvalue function of a generic matrix A. Observe that, thanks to the fact that Q(i*, t*) = 
Sg(^*) ~ '^x{t*) is a diagonal matrix, in (126) we have chosen 

U = 1„,, = I ^' I (130) 

\ On-l.l J 

with I; being the I x I identity matrix and On-i.i being the {n — I) x I zero matrix. Moreover, in (128), we have 
used the fact that A ^ B implies both that C'^AC ^ €"^60 and that ^i{A) > Mj(B) [22, Cor. 7.7.4(c)] and the 
lower bound on the derivative of the matrix Q(t) given in Lemma 6. We further used the definition: 

C(t*) = Y{t*)B{t*)Y{t*)'^. (131) 

Observe that, since B(i*) is a positive semidefinite diagonal matrix (see Lemma 4), we have C(t*) >: 0, which 
further implies that [C(t*)]ii > 0, for all i. Finally, the upper-left / x / sub-matrix of matrix A has been denoted 
by [A]i.; 1:/, and the last transition in (129) is due to the fact that both Sx(i*) and ^^(t*) are diagonal matrices. 

In order to proceed with the proof, we require the following lemma. 

Lemma 11: Let's consider a positive semidefinite matrix A and two diagonal positive semidefinite matrices Di 
and D2 such that T)i> D2 > 0. Then, we have that 

/x„,ax(DiADi - D2AD2) > (132) 

where /imax denotes the maximum eigenvalue function. 

Proof: See Appendix A7. ■ 

Now, using the fact that C(t*) is positive semidefinite, and the first condition in (124) that Xi{t*,t*) < for 
i — 1, . . . ,1, which further implies that [Sx(i*)]i.; 1.; >- [5^G(i*)]i.; i-/ h we can use Lemma 11 to conclude 
that. 



A^n 



[[i:xin]v.Ll:l [Cin]l:Ll:l Px(i*)]l,a, - Pg (**)] 1,,1:; [C(t*)] 1,,1:; Pg (**)] l,,!:^ ^ ^^ (133) 



Last result together with (126)-(129) implies that there exists some i E [1,1] such that Ai(i*,i*) < and 
DtXi(t,T — i*)lt^t* > 0, which clearly contradicts the conditions in (124). 

Since the contradiction described above holds for any arbitrary values for k, s, and / (under the condition 
I < s < k < n), we have thus proved that no nonnegative-to-negative zero crossing can occur for the eigenvalues of 
Q(t, r) or, equivalently, we have proved that the eigenvalues of Q(i, t) have at most a single negative-to-nonnegative 
zero crossing of the horizontal axis. 

2) Single crossing point for the eigenvalues of Q(t)." The relation between the sign of the eigenvalues of Q(t) 
and those of Q(i,T) is stated in the following lemma. 

Lemma 12: For all t and as a function of t, the number of positive, zero, and negative eigenvalues of Q(t) and 
Q(t, r) coincide. 

Proof: The proof follows straightforwardly from the definition of Q(t, r), given in equation (117), and 
Sylvester's law of inertia for congruent matrices [25, p. 5]. ■ 
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In the first part of the proof we have shown that Q(t, r) has, for each eigenvalue, at most, a single negative-to- 
nonnegative zero crossing. From this and Lemma 12, we can conclude that the number of negative eigenvalues of 
both functions cannot increase. Now, let's assume that Q(i) has an eigenvalue of multiplicity s with a nonnegative- 
to-negative zero crossing at to, i.e., /ii(Q(io)) = and /ii(Q(t)) < for t E {Iq, Iq + e), for some positive e and 
for i = 1, . . . , s. In order to refrain from increasing the number of negative eigenvalues, s negative eigenvalues at 
to must become zero. However, if we examine the number of eigenvalues at io + A for a sufficiently small A, 
the eigenvalues that were negative at to are still negative at to + A, and the total number of negative eigenvalues 
has increased. Thus, contradicting the possibility of a nonnegative-to-negative zero crossing of the multiplicity s 
eigenvalue of Q(t). This is valid for any arbitrary to, thus concluding our proof. ■ 

The following corollary is a simple consequence from Theorem 6. 

Corollary 6: If for a given t' the function Q(X, Rxg, t') >z Othenforallt > t' the function Q(X, Rxcji) ^ 0- 

B. The Conditioned Case 

The results of the previous section can be simply extended to the conditioned case. Given an extension of the 
lower bound on the derivative of Q, the extension of all other results is trivial. Thus, we briefly give the extension 
to the lower bound with a full proof (given in Appendix A8) and then for completeness restate the main result of 
this paper, for the conditioned case, without detailing the proof, which follows identically to the proof given above. 

Lemma 13: The following lower bound holds: 

DtQ{X\U,Kxa,t) h 2(Ex|c/(t)B(t)ET|f,(t)-EG(t)B(t)ET(t)) (134) 

where B(t) was defined in (45), and assumed a positive semidefinite diagonal matrix for all t (see Lemma 4). 

Proof: See Appendix A8. ■ 

Thus, the following theorem follows: 

Theorem 7: Each eigenvalue of Q(X|C/, Rxq, t) has, at most, a single negative-to-nonnegative zero crossing 
of the horizontal axis. 

Proof: The proof follows the same steps as those in the proof of Theorem 6. ■ 

C Properties of the Mutual Information 

So far we have seen the "single crossing point" property of the matrix Q(X,Rxg7^)' or more precisely, of its 
eigenvalues. As seen, this property also extends naturally to the conditioned case. In this section our goal is to relate 
this result to the mutual information between the input and the output of a parallel Gaussian channel. As expected, 
the advantage of this result is in the comparison between the mutual information assuming that the input to the 
channel has an arbitrary distribution and the mutual information assuming that it has a Gaussian distribution with 
an arbitrary covariance, Rxg ■ Our goal is to make use of this result through the I-MMSE relationship, as given in 
equations (43)-(44) and (46)-(47). The results given in this section can be viewed as supporting theorem/lemmas, 
that make our "single crossing point" property applicable through the use of the I-MMSE relationship. 
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For clarity we will write the results in this section only for, the more general, conditioned case, from which one 
can easily derive the respective unconditioned theorems. 

According to equation (47) the difference between the mutual information assuming that the input to the channel 
has an arbitrary distribution and the mutual information assuming that it has a Gaussian distribution with an arbitrary 
covariance, Rx^ is 

/ (Xg; Y{t)) - I (X; Y{t)\U) = / Tr {B{r){EG{r) - Exiu{r))) dr 

= / Tr(B(r)Q(X|[/,RxG,T))dT. (135) 

Jt=0 

Thus, we are interested in the properties of 

n 

Tr(B(i)Q(X|C/,Rxc,i)) = ^ A,(B(f)Q(X|C/,Rxc.,t)) (136) 

where we have used the fact that the trace of a matrix A is the sum of its eigenvalues [22, Th. 1.2. 12]. The following 
theorem extends the "single crossing point" property of the eigenvalues of Q{X\U, Rxq, i) to the eigenvalues of 
B(t)Q(X|C/,Rx«,t). 

Theorem 8: Each eigenvalue of B(i)Q(X|C7, Rxc i) has, at most, a single negative-to-nonnegative zero cross- 
ing of the horizontal axis. Moreover, the eigenvalues of B(t)Q(X|C/, Rxq, i) have the following property: 

sign{A,(B(t)Q(X|C/,RxG,t))} G {0,sign{A,(Q(X|C/,Rx«,i))}} . (137) 

Proof: For a non-singular B(t) and due to similarity [22, Cor. 1.3.4] we can write the following, 

A,(B(i)Q(X|C/,Rxc,t)) - A,(B5(t)Q(X|C/,RxG,t)B5(t)). (138) 

Recalling that B(t) is a positive semidefinite diagonal matrix, we have an eigenvalue of a congruent transformation. 
Thus, the proof follows similarly to the second part of the proof of Theorem 6 (given in Section V-A2), concluding 
the preservation of the signs of the eigenvalues of Q(X|C/, Rxci) in B(i)Q(X|C/,RxG;j^) ^i^d, as a result, 
concluding that all eigenvalues have, at most a single, negative-to-nonnegative zero crossing of the horizontal axis. 

If B(t) is singular, we can assume without loss of generality that the i*'* diagonal element is zero. Due to that, 
the i*-^ row of B(t)Q(X|C/,RxG,0 i^ all zeros, that is, one of the eigenvalues of B(t)Q(X|L'',RxG,^) i^ zero 
(and its sign is also zero). The rest of the eigenvalues can be calculated from the reduced problem, the matrix 
B(t)Q(X|L'',RxG, without the i*'* row and column. Recalling that B(t) is a diagonal matrix, this is simply the 
product of B(t) and Q(X|C/,Rxgj^) ^oi\\ without the i*^ row and column. This procedure can be repeated as 
long as the reduced B(i) matrix is singular. When the reduced matrix is non-singular, we again follow the proof 
of Theorem 6. 

Thus, we have shown that the eigenvalues preserve the sign of the eigenvalues of Q(X|C/, Rxgi^) with the 
additional possibility of falling to zero when B(t) becomes singular. ■ 

The next two lemmas provide the link between the above results, regarding the behavior of the eigenvalues of the 
matrix Q(X|C/, Rxg; ^) ^"d the matrix B(f)Q(X|C/, Rxc^), and the mutual information. Thus, they facilitate 
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the usage of these resuhs on information theory problems, as will be shown in the sequel. More particularly, 
so far we discussed the behavior of each and every eigenvalue of the matrix Q(X|[/, Rxg, ^i^^ the matrix 
B(t)Q(X|C/,RxG, 0' which holds true for any proper choice of Hxa with no regards to the random vector X. 
The next two lemmas identify the existence of specific Gaussian inputs which have unique properties with respect 
to the given random vector X. 

Lemma 14: Assume X E R" is an arbitrary distributed random vector For any t^ G [0, oo), there exists a 
Gaussian input covariance matrix Rxq such that the following hold 

1) Rxa ^ Rx 

2) I {X;Y{Q\U) = I iXG;YG{Q) 

3) Q{X\U,Iixa,te)hO 

Proof: See Appendix A9. ■ 

Note that the above claim can be extended to a general non-singular H(te), that is, not necessarily diagonal, 

by defining X = H(te)X. Due to the non-singularity of H(te), the mutual information is unchanged, i.e., 

I lX;Y{te)\Uj = I {X;Y{te)\U). Requirements 1 and 3 are preserved under any congruent transformation, 

specifically under the transformation H^^(te). 

The next lemma is an extension of Lemma 14 that will prove useful in the sequel. 

Lemma 15: Assume that for a given input distribution on the pair ([/, X) there exists a Gaussian random vector, 

Xa"^, with covariance IV^ such that for some te G [0,oo) we have that, 

1) /(X;r(ie)|C/)</(XG'^^lG"^te)) 

2) Q(X|C/,R^^,te)hO 

Thus, there exists a Gaussian random vector, Xg, with covariance Rxg such that the following holds: 

1) Rxg ^ R-Xg 

2) I{X;Y{te)\U)=I(XG;YG{te)) 

3) Q(X|[/,RxG,te)hO 

Proof: The proof follows the proof of Lemma 14, where instead of using E'^(te) (209) as a trivial upper 
bound we use: 

E^^(ie) = I-(R:^^+I)-^ (139) 

and the assumptions stated above. ■ 

D. Connections to Fisher Information 

In addition to the MMSE matrix, another important quantity in estimation theory is the Fisher information matrix 
[26]. Its connection to information theory has been established in the late 1950's and has been attributed to de 
Bruijn [8]. The de Bruijn identity relates the derivative of the differential entropy to the Fisher information matrix 
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defined as^: 

J(l^) = E{[VlogPy(r)] [V\ogPY{Y)f} (140) 

where the expectation is over Y. Note that this is a special form of the Fisher Information matrix (with respect to 
a translation parameter) which does not involve an explicit parameter as in its most general definition [26]. In [7] 
the authors have shown that the de Bruijn identity is equivalent to the I-MMSE relationship. Using this connection, 
the de Bruijn identity has been extended to a multivariate version in [17, Th. 4]. For our purposes we will use the 
following notation: 

Jx(H)= J(HX + Ar) (141) 

when we have some arbitrary input distribution on the random vector X. For the case of a Gaussian distribution 
on X with covariance matrix Rxq we will write Jg(Rxg , H). We further note that, as in the case of the MMSE 
matrix, whenever the channel coefficients depend on other parameters, H = H(0), we will write Jx ('/'). We can 
now extend the idea of the the matrix Q to the Fisher Information, using the following definition: 

W(X,Rxc,0)=Jx(0)-Jg(Rx«». (142) 

As in the case of the matrix Q, the matrix W has some distinct properties. Using the relationship between the 
two matrices we can derive these properties directly from the results of the previous sections. We first require the 
following lemma, given by Palomar and Verdii in [17]. 

Lemma 16: [17, App. E] Assuming the Gaussian additive noise channel (5), the following connection between 
the Fisher Information matrix and the MMSE matrix holds: 

3y = In - HExHT (143) 

Proof: The result follows directly from equation (106) in [17] by setting S„ equal to the identity matrix 
and recalling that the MMSE matrix in (106) is the MMSE matrix of Z = HX, from which it follows that 
E^ = HExH ■ 

We can now state the main result of this section. 

Theorem 9: The matrix W(X,RxG;,i) is related to the matrix Q(X,RxG,i) as follows: 

W(X,Rxc,i) =H(t)Q(X,Rx«,i)H(t)T. (144) 

Moreover, the properties given in Sections IV and V for the matrix Q(X, Rxg ^ ^)' transfer to the matrix W(X, Rxg i 0- 

Proof: Equation (144) is obtained through the use of Lemma 16. The properties given in Section IV regarding 
the matrix Q(X,RxGji) transfer to the matrix W(X,Rxg,^), due to the fact that H(i) is a diagonal positive 
semidefinite matrix for all t. The properties given in Section V regarding the matrix Q(X,Rxg,0 transfer to 
the matrix W(X,RxG,i), since it is simply a congruent transformation of Q(X,RxG,i) (this was explained in 
detail in part two of the proof of Theorem 6). ■ 

'For any differentiable function / : R" — >■ R, its gradient at any j/ is a column vector V/(j/) = {Dy-^f(y), . . . , Dy^f{y)] . 
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E. Application: The Degraded Parallel Gaussian BC Capacity Region under Covariance Constraint 

In this section we show that the resuh of Section V-C can be used to provide a converse proof for the degraded 
parallel Gaussian BC capacity region under a covariance constraint. We consider the following model: 

Yi[m] = HiX[m]+Ni[m] 

Y2[m] = H2XH + ATaH (145) 

where Ni[m] and N2[m] are standard additive Gaussian noise vectors independent for different time indices m, 
and Hi and H2 are diagonal positive semidefinite matrices such that Hi ^ H2. X E K," is the random input 
vector, and it is assumed independent for different time indices m. 
We consider a covariance constraint: 

Rx ^ S (146) 

where S is some positive definite matrix. 

Since we have a degraded BC, we can use the single-letter expression as given in (100). As in Section IV-D, 
we will follow the proof given for the scalar Gaussian BC in [6], [20]. Using Lemma 4 we can construct a path 
such that: 

H(t2) = H2 
H(ti) = Hi 
H(0) = (147) 

where < ti < ^2 and H(t) is diagonal for all t £ [0,^2]- 

Now, assume a pair (U, X) with covariance Rx for X. According to Lemma 14, there exists a Gaussian random 
vector with covariance Rxg such that the following properties hold: 

1) Rxa ^ Rx. 

2) I{X;Y{h)\U)^I{XG;YG{h)). 

3) Q{X\U,Rxa,t) h for all t > ti. 
Using the I-MMSE relationship (47) we can write, 

/(XG;H(t)XG + Ar)-/(X;H(t)X + iV|[/)= / Tr (B(r)Q(X|C/,Rxc,T)) dr (148) 

Jt=0 

= / VA,(B(T)Q(X|C/,RxG,T))dT. (149) 

Jr=0 ,_-, 
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Using the above properties on (149) we have that for any t' > ti, 

IiXG;iiit')XG + N)-I{X;ii{t')X + N\U)= /' Tr (B(t)Q(X|C/, Rxg,^)) dr 

Jt=0 

+ f Tr(B(T)Q(X|C/,Rxc,T))dT (150) 

JT=ti 

r-t' n 

= 0+ ^Ai(B(T)Q(X|[/,Rx«,T))dT>0 (151) 

•^^=*i i=i 

where (151) follows from property 2, and the inequality follows from property 3 and Theorem 8. 

Thus, we have shown the existence of a Gaussian random vector, Xq, with covariance matrix Rxc with the 
following properties: 

I {X;Yih)\U) = I iXG;YG{ti)) 

I {X:Y{t2)\U) < I {XG;YG{t2)) 

Hxa d Rx (152) 

Using these properties on the single-letter expression (100) we obtain the following outer bound, 

Ri </([/; l^i) = / (X; Yi) - I {X; Yi\U) 

1 1 1 IT -I- 1-T STT"'"! 

< -log|I + HiSHT| - -log|I + HiRx^H^I = -log^L-^-^— ii^ (153) 

R2</(X;r2|t/)<^log|I + H2Rx«Hj| (154) 

This outer bound is tight and the achievability is well-known using superposition coding. This approach can be 
extended to the M-user scenario as shown in Appendix C. 

F. Application: The Compound Degraded Parallel Gaussian BC Capacity Region under Covariance Constraint 

In this section we show that the results of Section V-A can also be used to provide a converse proof for the 
compound degraded parallel Gaussian BC capacity region under a covariance constraint. We consider the following 
model. 



Y^m]^UlX[m]+N^m], j = l,...,M, i, = l,...,K^ (155) 

where AT^ , j = 1, .., M, ij = 1, ...,Kj are standard additive Gaussian noise vectors independent for different time 
indices m, and tlj,, j = 1, ..., M, ij = 1, ...,Kj are diagonal positive definite matrices such that: 

H^'^ ^ Hg;^^ Vj = 1, . . . , M, I, e {1, . . . , K,}, z,+, G {1, . . . , K,+,}. (156) 

Since these matrices are diagonal, there exist matrices H?- , j^s for j = 1, . . . , A/ — 1 such that 

Hf^ ^ H^+i)^ ^ H^^^^ Vj-l,...,M-l,z,e{l,...,i^},z,+ie{l,...,i^+i}. (157) 
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Note that the equivalence between conditions (156) and (157) is not true in general (for non-diagonal matrices), as 
explained in [27]. X G R" is the random input vector, and it is assumed independent for different time indices m. 
We consider a covariance constraint: 



R^^S 



(158) 



where S is some positive definite matrix. 

Before proceeding, we provide the following single-letter expression for the capacity region of this M user 
memoryless channel. This is a simple extension of [27, Lem. 4]. 



Lemma 17: Consider a memoryless compound BC with input X, M outputs Y^,, j = 1, 



,M,i, 



1, 



,,if„ 



and auxiliary random outputs Y*^-,^-.- with j e {1,...,M — 1}. All outputs are defined by their conditional 



probability functions: P^j ,^ and Py* \x- Furthermore, assume that these outputs are stochastically degraded 
such that there exists some distribution such that X — Yf^^^ — ^m(m-i) ^ -^^m-i ^ '^1m-i)(m-2) ^ ■ ■ ■ ^ Yf^ — 



1^21 ~ Y\ form a Markov chain for every choice of zi,i2, ■ 
by the union of the rate tuples satisfying 



, ,iM- The capacity region of this channel is given 



R, < min I [V j;Yl\V j-x 



(159) 



where Vq = 0, Vm = X and the union is over all probability distributions satisfying 



Vo-Vi 



Vm-1 - V 



M 



X 



-M 



^ M(M-l) 



Y 



M-l 



..Yf 



Y*,-Yl(m) 



^ (M-l)(M-2) 

Proof: See Appendix D. ■ 

Using Lemma 17 we prove the following theorem. 

Theorem 10: The capacity region of the compound degraded parallel Gaussian BC ( 1 55), is given by the following 
expression: 

1. 



Rm < min -log 



Rj < min jrlog 



Hff^RcM (H.'^m) +I 



,M 



HiE;:,Ro. H^, 



V ^^ + 1 



Hi E;!f,+i a 



Gl 



H 



v.? = 1, 



,M- 1 



(161) 



^M 



where Rgj ^e some positive semidefinite matrices such that -< E;=i Rg; ^ S- 

Proof: According to Lemma 4 (and the remark after this lemma) for any set of {ii, ^2, . . . , Jm} where ij G Kj 
we can construct a diagonal path such that 

H(i,J = H,^., j=.l,...,M 

H(t0+ib) = H^,+i),, .? = 1,...,M-1 

H(f = 0) = (162) 
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with < iij < ^21 < ^42 ^ • • • ^ ^ij ^ ^(j+i)i — ^ij+i — • • • — ^iM- Now, let's examine a tuple of rates on 
the boundary of the capacity region: (R°^ , R2^ , • ■ • , R^ )■ Assume that this tuple has been attained by the joint 
distribution Pvi,...,Vm_i,x on the tuple with covariance Rx ^ S as required by the constraint (158). 
We begin by looking at the following partial Markov chain: 

Vo -Vi - . . . - Vm-1 - V M - X l^M(M-l) ^ ^*Af-l)(Af-2) - • • • - ^21- (163) 

Now, assuming that ^f, i d, are the outputs, we can use Lemma 22 which states that there exist M Gaussian inputs 
Xqj, with covariance matrices Rxg, such that, 

rt(3 + i)3 



j ''^'"Tr(B(T)Q(X|V„Rx«,,T))dT = (164) 

y ''^'"Tr(B(r)Q(X|y„Rx«,,T))dT>0, Vj = l,...,M-2 (165) 

and such that < R-Xg, ^ R-Xg,_i, for j = 2, . . . , M — 1 and < R-Xgi ^ ^- Furthermore, 

Q(X|V,,RxGj,to+i),)hO (166) 

for all j == 1,...,M- 1. 

Using this result, and according to Corollary 6 we know that Q(X|Vj, Rxg,, i) h for all t > i(j+i)j- This 
holds for any diagonal path, such that H(t(j_|_i)j) = HI .,^s . Now, using Theorem 8 and (164) we can conclude 
that, 

J Tr (B(T)Q(X|y„ Rxg,, r)) dr < 0, Vt < i(,+i), 

y Tr(B(T)Q(X|y„RxG,,r))dT>0, Vi > i(,+i),. (167) 



Due to the Markov chain 



V, - y,+i - X - n;^ - Yf, , 1,, - n (i68) 



(167) is particularly valid for, 

^ - / 



I '^ Tr (B(r)Q(X|y„RxG,,r)) dr < 0, Vz, G /f, 
/ '^' Tr (B(r)Q(X|y„RxG,,T)) dr > 0, Vz,+i G if,+i 



(169) 



March 1, 2013 DRAFT 



for any j = 1, . . . , M — 1. Equations (159) can be written explicitly, as follows: 



34 



Rm<. min l{X;YfjVM-i) 



R 

Rm-2 



< 



M-1^. min l(X;Yf'^-l\VM-2)-l(X;Yf^-\\VM-i 



< 



-1,-,K. 



IiX;Yfi-_l\VM-s 



-I{X;Yfl-l\VM-2 



R2<. min^^ l{X;Yl\V^)-l{X;Yl\V2 



12 = 1,. ...K; 



Ri<. min /(X;rijFo^0)-/(X;rijVi). 

ll = i,...,Ki 

Using (169) and the trivial bound on / (X; l^^J we can upper bound these expressions as follows: 



(170) 



Rm<. min l{XG,,_,;HtlXG,,_^+N) 



Rm-1 < mm ^(-^ 



GM-2'l'^iM-\^GM-2 +N\ - I iXcM-l'l'^iM-l^GM-l +^ 



Rm-2 < m.in 

*M-2 — l,---,-^M-2 



rM-2 



^ I XGM^3'^^iM-2-^GM~3 + ]^] - I [XGM-2'^^iM-2-^GM-2 + ^ 



rM-2 



Defining, 



R2< . min I {XG,;IilXG, + N) - I {XG2;tilXG2 + N) 

«2 — 1,.--,^2 



Ri < min -log 



I + IllS{Uiy -l{XG,;IilXG,+N) 



(171) 



Rgi S H 



XgI 



Rgj — Rxgj 1 Rx 



Gj' 



v.? = 2, . . . , M - 1 



R, 



■GM 



R 



^C3M-1 



(172) 



(171) becomes the following set of upper bound. 



Rm < . min -log 



Ro < min -log 



i + H^^/i^GM (afi) 



.j = l,...,Kj 2 



i+H^^E;!f,RG.(H^/ 


i+H^^E;!f,+iRG.K)' 



Vj = l,...,Af -1 



,M 



(173) 



where Rcj ^e some positive semidefinite matrices such that < Ei=i ^Gi = S. 

The above upper bounds can be attained simultaneously using a joint Gaussian distribution on the tuple. 



(Fo = 0,^i,-.-,V^m-i,Vm-X) 



(174) 
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as follows: 

y^ = Vj-i + Uj (175) 

where U j ^ M (O, Rgj) for j = 1, . . . , M, independent of each other, and where Rgj are positive semidefinite 
matrices such that X]i=i ^Gi d S. This concludes the proof of the capacity region. ■ 

VI. Summary 

In this work we extended the "single crossing point" property from the scalar setting to the parallel MIMO setting. 
We have shown three different "single crossing point" properties, given in three phases of extension from scalar-to- 
vector. These properties cannot be trivially deduced from each other. All three emphasize the basic optimality of the 
Gaussian input distribution in the Gaussian regime. The most general of these properties, given in the third phase, 
shows a "single crossing point" property for each of the eigenvalues of the matrix Q(X,RxG,i), the difference 
between the MMSE matrix assuming an arbitrary Gaussian input, and the MMSE matrix assuming an arbitrary 
input distribution. We demonstrate the applicability of these properties on several information theoretic problems: 
a proof of a special case of Shannon's vector EPI, a converse proof of the capacity region of the parallel degraded 
MIMO broadcast channel (BC) under per-antenna power constrains and under covariance constraint, and a converse 
proof of the capacity region of the compound parallel degraded MIMO BC under covariance constraint. 

An open question is: can we extend the "single crossing point" property to the general MIMO channel? Note 
that, although the optimality of the Gaussian input is known for several MIMO Gaussian multi-terminal problems, 
we cannot necessarily conclude the existence of a "single crossing point" property. However, the implications of 
a general "single crossing point" property go beyond the specific applications shown here, and are also of interest 
on their own. 

Appendix 
A. Proofs of Lemmas 

1) Proof of Lemma 1: Since A is positive semidefinite we can always write A ~ aAA^ such that Tr (AA^) = n 



and a > 0. Then, it can be checked that 

gA(X,a2,7) = ^ " . Tr(A)-Tr(AEx(7)) (176) 



a2 



1 + 0-^7 
= a \ n 



-^^-Tr(ATEx(7)A) (177) 

1 + cr -y 



= a \ n 



a^ 



l + ^2^-Tr(E^.x(7))j (178) 

= aqi^{X,a\j) (179) 

where we have defined X ~ A^ X. Now, from (179) and the fact that a > 0, the desired result follows. 
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2) Proof of Lemma 2: Let us consider the random vector X E W\ whose covariance is given by Rx and denote 
its eigenvalues by Xxa- RecalUng the model in (13), it is well known that Ex (7) ^ J^x —J^x {jJ^x +'i-n)^^'Rx , 
[26]. Thus, we have that 



Tr(Ex(7)) < Tr(Rx -7Rx(7Rx +I„)"'Rx) (180) 

+ 1>^XA 



g^-'-ri^) 



^ 1 + 7Ax,i 
Now, realizing that the right hand side in (182) is a Schur-concave function (it follows directly from the concavity 
of Y4^) ^'^'1 '^hat, from the statement of Lemma 2, we have that 'J27=i '^x,i < na'^, it follows directly from 
majorization theory [28] that the right hand side in (182) is maximized when Xx,i are uniformly distributed, i.e., 

XA 



Xxa = (J^- 



3) Proof of Lemma 3: From the definition in (16), it follows that 



D^(7a(X, a', 7) = -^^— ^^Tr(A) - D^Tr (AEx (7)) • (183) 



The expression for D^Tr (AEx(7)) can be computed from the results in [14] and applying the chain rule as 

D^Tr(AEx(7)) = DE^(^)Tr (AEx(7)) ' DhEx(7) ' D^H (184) 

= vecT(AT)D„( - 2D:E {*x(l^) ® *x(l^)} (l„ ® H^)) -i-vec(I„) (185) 

= -vecT(AT)N„E{*x(l^)®*x(l^)}vec(I„) (186) 

= -Tr(AE{*x(l^)'}) (187) 

where we have used that H = ^I„, N„E{*x(^) <E) *x(^)} = E{*x(^) «'*x(^)}N„, and N„vec(I„) = 
vec(I„) (see [14, App. A] for the definitions of the matrices D„ and N„ and some of their properties). 
Plugging (187) in (183), the desired result follows. 



4) Proof of Lemma 5: For any arbitrarily distributed random vector X, with zero mean (assumed w.l.o.g.) and 
covariance matrix given by Rx, it is well known that Ex(i) ^ EG(Rx,i), from which it follows that [22, 
Obs. 7.1.2] 

[ExW].^<[EG(Rx,t)]^^ (188) 

where we recall that Eg (Rx , t) is the MMSE matrix attained assuming a zero mean Gaussian input with covariance 
matrix equal to Rx- Observe that equality in (188) is attained if and only if X ~ A/^(0, Rx)- 
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Furthermore, from the fact that dependence among entries can only improve the MMSE, we have: 

[EG(Rx,t)].. < [EG{lnoRx,t)]u = i^ri-f.^^B 1 (189) 

i + [ti[t)\i^[H.x\ii 
where Eg (In ° R-x, i) represents the MMSE matrix when the entries of the input vector are independent Gaussian 
random variables (thus, with diagonal covariance matrix). Observe that equality in (189) is obtained if and only if 
the entries of the Gaussian distribution in the left hand side are independent. 

Now, the desired result follows immediately from the fact that the right hand side in (189) is an increasing 
function of [Rx]m- 



5) Proof of Lemma 6: We first provide the derivative of the MMSE with respect to the parameter t. Using 
equation (52), we have 

D*[Ex(t)],, = ^D[H(t)]jExW].,[D*H(t)],,. (190) 

I 
Using the result ([14, eq. (131)]), 

= -E{[*x(l')],, [^x{Y)\i [H(t)], + [^x{Y)l, [-^x{Y)]^i [H(i)]„} 

= -2[H(t)]„E{[*xW],,[*xW],,} (191) 

where ^x(y) was defined in (9). The second equality in equation (191) is due to the fact that H(i) is diagonal. 
Thus, we can write the derivative of [Ex(i)]i, as 

D,[Ex(t)],, = -25][H(i)],,E{[*x(r)]^,[*x(l^)],J[D*H(i)]„ 

= -2^[B(t)]„E{[*x(l^)],,[*x(r)],J (192) 

since [B(i)];j = [H(t)]j; [DtH(i)];; (45). We can put this expression into a matrix form as follows: 

D.ExW = -2^ [B(i)]„ E{[*x(l^)], ['^xiY)]]] (193) 

I 

where [$x(y)]; is the ^*'' column of the matrix ^x(y)- Using the fact that for a Gaussian input distribution 
*x(y) does not depend on Y and thus $x(y) = E{$x(y)} — EG(i) [14], we can obtain the following lower 
bound on the derivative of the matrix Q(X, Rxg, t): 



D,Q(X,RxG,t) = 2Y,[m)]u[^{[^x{Y)\[^x{Y)]]]-[^G]i[^G]] 

I 

h 25^[B(i)]„ (e{[*x(1^)]JE{[*x(1^)]J^- [Eg], [^g]J 
I 

- 2 J2 [B W]„ (Ex^Ex^ - [Eg], [E^]/) 
= 2 (Ex(t)B(t)Ei(i) - EG{t)Bit)El{t)) 
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where the inequality is due to Jensen. This concludes the proof of the lemma. 

■ 

6) Proof of Lemma 7: Since A and B are two general positive semidefinite matrices, the dimension of the 
intersection of their null spaces, denoted by A^() fulfills 

dim7V(A)niV(B) ==/c, 0</c<n. (194) 

Let {ui, . . . , u„} be an orthonormal basis of the n-dimensional space such that {ui, . . . , Ufe} is an orthonormal 
basis of iV(A) n A^(B) and define U == [ui . . . u„]. We thus have 

T /oo\ -, /oo\ 

U^AU = , U^BU = , (195) 

y A' y y B' y 

where A' and B' are the non-zero (n — fc) x (n — k) lower right square sub-matrices of U^AU and U^BU, 
respectively. Observe that now we have N{A') n iV(B') = {0}. 

Now, from [22, Sec. 4.5,Prob. 8(e)], we have that A and B are simultaneously diagonalizable by an invertible 
matrix S if and only if A' and B' are also simultaneously diagonalizable. Consequently, we have reduced our proof 
to showing the simultaneous diagonalization of two positive semidefinite matrices such that the dimension of the 
intersection of their null spaces is 0. 

From this point, we can thus assume the following: 

A = A"^A h 0, (196) 

B = B"^B h 0, (197) 

dim N{A) n iV(B) = 0. (198) 

The next step is to prove that A and B have no common isotropic vector, which is defined in [29, Def. 1.7.14] 
as a vector x 7^ such that x^Ax = and x^Bx = are both simultaneously fulfilled. 
Using the expression in (196), we have that 

x"^Ax = 0^ x"^A"^Ax = 4^ Ax = 0, (199) 

which can also be applied to x^Bx — 0. Consequently, if a vector x fulfills x^Ax = and x^Bx ~ 0, we have 
necessarily that x e A^(A) n N{B). However, since N{A) C iV(A) and, similarly, N{B) C iV(B), from (198) 
we have that dim Af(A) n Af(B) = 0, which implies that A and B have no common isotropic vector. Now, from 
[29, Th. 1.7.17] we have that A and B are simultaneously diagonalizable. 



7) Proof of Lemma 11: For this proof we require the following result: 

Lemma 18: Let's consider two positive semidefinie matrices A and B. Then we have 

Mmax(A - B) > Mmax(A) - Aimax(B) (200) 
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where we recall that ^^^^(A) denotes the maximum eigenvalue of matrix A. 

Proof: The proof follows directly from [22, Th. 4.3.1] recalling that /Xmin(— B) = — /iniax(B). ■ 

Now, it is clear that for a positive semidefinie matrix A and two positive semidefinite diagonal matrices Di and 
D2 we have that D^AD^ ^ 0, i = 1,2. Now, from the above lemma we have, 

/X^ax(DiADi - D2AD2) > Mmax(DiADi)-/imax(D2AD2) (201) 

= Mmax(A5D?A5)-^j„ax(A5D^A3) 

where the last equality follows from [22, Th. 1.3.20] and the remark, for square matrices, in the paragraph preceding 
it. Finally, since Di >z D2 >z and they are both diagonal, we have that D^ >z T>\ >z and, using [22, Obs. 7.7.2] 
and [22, Cor 7.7.4] we can write, 

Aimax(A5D2A5)>^„iax(A5D2A5) (202) 

from which the desired result follows. 



8) Proof of Lemma 13: We extend the lower bound derived in Lemma 6 to the conditioned case, that is, we 
assume U — X — Y. From (193), for the conditioned case we have the following: 

= -2j2mt)]uHi^x{Y,U^u)]^[^x{Y,U^u)]l}. (203) 

Taking expectation according to U on both sides we have: 

DtExiuit) ^E{DtExiu{t,U)} = -2j2[m)]iiHi^x{Y,U)],[^x{Y,U)]l}. (204) 

The derivative of Q(X|Y,RxGii) is then given by: 

DMX\Y,Rxa,t) - 2j2mt)]u{H[^x{Y,U)]i[^x{Y,U)]J}-E{[^XaiY)]A^XaiY)]J}] 

I 

= 2^ [B(t)]„ {E{[^xiY,U)], [^xiY,U)]J} [EgW], [E^Wl^) 

h 2^[B(t)]„ {E{[^xiY,U)],}E{[^x{Y,U)]J} - [EgW], [Eclt)]^) 

= 2^ [B(t)]„ {[Bxiuit]], [Ex|t/(t)]^ - [EgW], [EgW][) 
I 

= 2(Ex|t/WB(i)Ex|c/W-EGWB(t)ET(i)) (205) 

where the inequality is due to Jensen. This completes the proof of the lemma. 
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9) Proof of Lemma 14: We first claim that w.l.o.g. we can restrict the proof to H(fe) = I- This is shown 
by redefining X = H(fe)X. Now, if H(te) is non-singular, then this redefinition does not change the mutual 
information i.e., / ( X; Y{te)\U] = I {X; Y{te)\U), and requirements 1 and 3 are preserved under any congruent 
transformation. If H(te) is singular, the problem can first be reduced in size, since H(te) is diagonal for all t. 
Thus, from this point on, we will assume H(te) = I. 

We provide a constructive proof, and show how one can build a Gaussian input distribution such that all three 
requirements are fulfilled. We begin by rewriting requirement 1 as a condition on the matrix Q(X|[/, Rxcj^e) 
rather then on the covariance matrix Rxg • We do so by defining a new matrix, which is the distance of the MMSE 
matrix Ex|[/(^e) from the linear MSE matrix E'j^(te)- We proceed by showing that, there exists a fraction such that, 
by defining Q(X|C/, Rxg, ^e) to be that fraction of the newly defined matrix, we comply also with requirement 
2. 

As explained above, we begin by rewriting requirement 1 in terms of the matrix Q(X|C7, Rxg j ^e)- Requirement 
3 is already a requirement on the matrix Q(X|C/,RxGj^e) and is as follows, 

Q(X|C/,RxG,te)=EG(te)-Ex|C/(te)hO. (206) 

The MMSE for the Gaussian input is: 

Ecl^e) " R-Xg ^ Rxg(R'Xg + I)~ R-Xg 

= R-Xg ^ I^Xg (J^Xg + I) (R-Xg + I) + R-Xg (R-Xg + I) 
= RxgIR-Xg +I)" 

= (Rxg + I)(Rxg + I)-' - (Rxg + I)-' 

= I-(Rxg+I)"'- (207) 

From equation (206) Rxg complies with the following: 

(Rxg +!)"'= I- Ex|f/(ie)-Q(X|C/,RxG,ie). (208) 

Note that the above equation connects Q(X|[/,RxGj^e) with Rxg- Thus, given a specific substitution of 
Q{X\U, RxGJ^e) we have a complete definition of the Gaussian input distribution. Similarly, the MMSE assuming 
an optimal linear estimator of X (only from Y{te)) is given by: 

E^=I-(Rx+I)-' (209) 

and we have that, 

Ex|C/(te)^E^(fe) Vt. (210) 

Thus, we can define: 

C = E^(te) - Ex|C/(te) = I - (Rx + I)"' - Ex\u{te) t 

Ex|[/(ie)=I-(Rx+I)"'-C, ChO. (211) 
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Note that C is completely defined by the input random vector, X. Inserting (211) into equation (208) we have: 

(Rxg +!)"'= I- [l-(Rx+I)"'-C] -Q(X|C/,Rx«,ie) 

= (Rx+I)"'+C-Q(X|C/,Rx«,ie), Q(X|C/,Rxc,te)hO, C h 0. (212) 

We now require the following supporting lemma, 

Lemma 19: Assume X E R" is an arbitrary distributed random vector. For any t' e [0, oo) there exists a 
Gaussian random vector, Xc, with covariance matrix Rxq such that, 

1) ^ Kxa ^ Rx 

2) Q(X|C/,Rx«,i') = 

3) I{XG;YG(f))<IiX;Yit')\U) 

Proof: See Appendix A 10. ■ 

Note that according to Lemma 19 we have that for Q(X|C/, Rxgi^e) — there exists a Gaussian random 
vector, Xg, which ensures I (XG;YG{te)) < I {X;Y(te)\U). On the other hand, if Q(X|C/, Rx^ie) = C 
we have, according to (212), that Rxg = ^x in which case we have I {XG',YG{te)) > I {X;Y{te)\U). 
Moreover, from (212) we can observe that instead of requirement 1 i.e., Rxg ^ R-x, we may simply require 
Q(X|{7,RxG:ie) ^ C (212), thus requirements 1 and 3 can be written as follows, 

O^Q{X\U,-Rxa,te)^C (213) 

where C is defined in equation (211). The question is whether there exists such a Q(X|C/, Rxci^e) ^ C that 
will also attain requirement 2 i.e., I {XG',YG{te)) = I {X;Y{te)\U) = a. From the above mentioned we know 
that, 

^log|I + Ri,^|<a<i|og|I + R2,^| (214) 



where, 



I + Rxg-((R-x+I)"' + C) ' (215) 

I + Rxg=I + R'X. (216) 



Thus, (214) can be rewritten as: 



i|of|((Rx+I)-'+C) '|< a <i|of|I + Rx 

We now need the following result. 
Lemma 20: Let's define the function: 

For A ;^ 0, B )^ and A ^ 0, the function, r{v) is continuous and monotonically decreasing in v for Q <v <1. 
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Proof: The proof is similar to the proof of Lemma 10 in [5]. ■ 

In our case we have: 

A = I )- (219) 

B = (I + Rx)"V0 (220) 

A = C h (221) 

and, 

1 lAI 1 lAI 

(222) 



1, |A| 

2 ^IB + i/AI 



< a < -loe 



2 °|B + i^A| 



u=0 

Thus, according to Lemma 20, there exists a v* such that, r{v*) = a. That is, 

1, |A| 1, |I| 
a = — 02 — = — oe — 

2 ^|B + j.*A| 2 ^|(Rx+I)-i+J^*C| 

where the last equality is due to equation (212). That is, 

Q(X|t/,Rxc,ier = (l-;^*)C (224) 

and since < i^* < 1 we have that < Q(X|C/,RxG,ie)* ^ C, as required. To conclude, we can construct a 
Gaussian input distribution, complying with all three requirements, as follows, 

I + Rx£ = ((Rx + I)"' + C - Q(X|C/, Rx«, te)*)"' = ((Rx + I)"' + y*Cy^ (225) 

where v* is derived from the equality in (223). This completes the proof of the lemma. 



10) Proof of Lemma 19: We first show that there exists a covariance matrix Rxg such that requirements 1 and 
2 are fulfilled. Then, we will show, using contradiction, that requirement 3 is also fulfilled. 
First note that, requirement 2 i.e., Q(X|C/,Rxgj^') — completely defines Rxg?: 

I-(Rx«+I)-i=Ex|f/(t') 

(I-Ex|[/(t'))"'-I = RxG (226) 

where we have used the expression in (207), and using the expression in (209) we can show that I — Ex|i7(t') is 
an invertible matrix since, 

Ex|c/(t') ^ E^(t') = I - (Rx + 1)-' < L (227) 

We now need to check that the first requirement holds: 

RxG = (I-Ex|f/(0)"'-IbO 

I-Ex|c/(t')^I 

< Ex|f/(t') (228) 
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and, 

Ex|c/(t')dE^(t')=I-(Rx+I)-^ 

Rx+lh{l-'Ex\uit')r' 

RxhRxa (229) 



Rx-(I-Ex|t/(i'))"'+I^O 



Thus, we have shown that given an arbitrary input we can find the required Rxg- 

We now want to show that / (Xg; IgC*')) — ^ {-^j'^{'t')\U)- For any Gaussian random vector Xc* with 
covariance R^^^ such that R^^^ -< Kxg we have EQ(t') -< Eoit') and thus, Q(X|L'',R5fQ, t') -< 0. Using 
Theorem 8, assuming that we do not have B(t) = for all < i < t'*, and the I-MMSE relationship (47) we 
have, 

I {XG*;YG*it')) < I iX;Y{t')\U) . (230) 

Now let's assume that, 

I{XG;YG{t'))>I{X;Y{t')\U). (231) 

The fiinction I {XG',YG{t')) is continuous in the value of its eigenvalues, since, 

1 " 
I {XG;YG{t')) = -J2^og{l + K{Kxa)) ■ (232) 

We can construct R^^ by reducing by e the value of all eigenvalues of Rxg- According to (231) we can find a 
small enough e, such that the following inequality still holds: 

/ (Xg; iG(t')) > / {Xg*; Y{t')) > I (X; Y{t')\U) (233) 

but this contradicts (230) and by that proves that, 

I{XG;YG{t'))<I{X;Y{t')\U). (234) 

This concludes the proof of the lemma. 



B. Converse Proof of BC Capacity Under Per-Antenna Constraints for M-Users 
We consider the degraded parallel Gaussian BC channel: 

Yj H = HjXH + Nj [m] j = 1, . . . , M (235) 

'*B(t) = for all < t < t' then all mutual informations equal to zero regardless of the input distribution and the lemma holds trivially. 
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where Nj[m], j — 1,..,M are standard additive Gaussian noise vectors independent for different time indices m 
(and can be considered independent of each other), and Hj, j — 1, .., M are diagonal positive semidefinite matrices 
such that Hj :< H^+i, for all j = 1, ... , M —1. X E R" is the random input vector and it is assumed independent 
for different time indices m. 

We consider a per-antenna power constraint: 

[E{XXT}]^^<P, Vi,l<z<n. (236) 

Since we have a degraded BC, we can use the single-letter expression given in [21]: 

R, < / (V,;Yj\V,^i) J = 1, .., M (237) 

where Vj are auxiliary random variables, Vm = X, Vq = 0, and the union is over all probability distributions 
satisfying 

Vn — ... — Vm-1 — Vm X Y m — ^m-i — ••• — ^2 — ^i- (238) 

This is an extension of the proof given for the two user case. We begin by rewriting the single-letter expression 
(237) as follows: 

R,<I{X;Y,\V,-i)-I{X-Y,\V,) j-l,..,M (239) 

and more explicitly: 

Ri < I{X-Yr)-I{X-Yi\Vi) 
R2 < I{X;Y2\Vi)-I(X-Y2\V2) 

Rm-2 < I (,X]Y M-2\y M-z) — I (,X]Y M-2\y M-2) 

Rm-1 < I {X]Y M-i\V M-2) — I {X;Y M-i\V M~i) 

Rm < I{X;Ym\Vm~i). (240) 

According to Lemma 4 (and the remark after this lemma) we can construct a diagonal path such that 

H(i,) = H, j = l,...,M 

H(0) = (241) 

with < ti < t2 < . . . < tM- Now, assume a distribution P{Vo='!>,Vi.---,Vm-i,Vm=x} on the tuple [Vo = 
0, Vi, . . . , V M-l^ V M = X) with covariance matrix Rx- We begin by proving the following lemma: 
Lemma 21: There exist M independent Gaussian inputs Xq , with covariance matrices Aq such that. 



/'d,(X|F,,AG,,T)dT = 

Jo 



'^'d,(X|F,,AG,,T)dT > 0, yi V7 = 1,...,M-1 (242) 
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and such that [A^,]^^ < [Ag,_ J,,, for j = 2, .., M - 1 and [AgJ,, < [Rx],,- 
Proof: We will prove the above using induction. 

The case of j — 1: This is identical to the proof given in Section IV-D. 

For a general j: We assume the above holds for j and prove for j + 1. Due to the Markov relation (238) we 
have that, 

d,(x|y,+i,AG,.,t) = d,(x|y,v,+i,AG,,t), yt (243) 

and thus, 

d,(X|V,+i,AG,,t) = d,(X|V,F,+i,AG^,t)>d,(X|y,,AG^,t) Vi (244) 

where the inequality is, again, due to the Markov relation (238) and the definition of the function Ai{X\V j^ Ag , t), 
given in equation (79). This provides us with the following inequality, 

/'^'d,(X|y,+i,AG,,T)dT> /'^'d,(X|y„AG,,r)dr>0 (245) 

Jo Jo 

where the first inequality is due to (244) and the second is due to the induction assumption on j (242). Again, 
following the same derivation as in the proof in Section IV-D, we know that there exists an independent Gaussian 
input with covariance Ag +i such that, 

'^'d,(X|y,+i,AG,^,,T)dT = (246) 





t' 

(247) 



/ d,(X|y,+i,AG,+i,T)dT > Vt'>t,+i, Vi 



where (247) is true specifically for t' ~ tj+2- Finally, from (246) and (245) and the monotonically increasing 
property of Ai{X\V j^i,Aq,t) in [Ag]jj (fourth property of Corollary 3), and the fact that it is independent of 
all other entries in Ag, we can conclude that [Agj^.i] < [Ag^] ■ This concludes the proof of the induction. ■ 
Now, inserting the above bounds (242) (with the addition of the trivial bound on / (X; l^i), under the per-antenna 
constraint (236)) into the single-letter expression in (240) we obtain the following outer bound 

Ri < ^log|I + HiPHf|-i|og|I + HiAGiHf| 

R2 < ^l0g|I + H2AG,H^|-^l0g|I + H2AG,H^| 



Rm-2 < 2'°g|I + HAf-2AGM-3Hl^-2l - 2'°g|I + Hm-2Agm_,H^_2I 

Rm-1 < 2'°e|I + HAf-iAGM-2Hl^-il - 2'°g|I + Hm-iAg„_iH^_iI 

Rm < ^log|I + HMAG,,_,H^|. (248) 

where P is a diagonal matrix with [P]^^ = Pi, and Ag are positive semidefinite diagonal matrices such that 
:< Aqm — Agm-1 ^ • • • ^ Ag2 ^ Agi ^ P- The achievability of this outer bound is well-known using 
superposition coding. 
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C. Converse Proof of BC Capacity Under Covariance Constraints for M-Users 

We consider the same setting as in Appendix B, given in (235), but now with a covariance constraint, 

Rx d S (249) 

where S is some positive definite matrix. 

As in Appendix B, since we have a degraded BC, we can use the single-letter expression given explicitly in 
equation (240), with auxiliary random variables complying with the Markov chain as detailed in (238). Furthermore, 
we construct a path as was done in equation (241). Now, assume distribution P{Va=$.Vi Vm-i-Vm=x} on the 
tuple [Vo = 0, Vi, . . . , V M-i: V M ^ X) with covariance matrix Rx- We begin by proving the following lemma. 

Lemma 22: There exist M Gaussian inputs Xq., with covariance matrices Rxg, such that, 

j ' Tr (B(T)Q(X|y,, Rx,,,t)) At = 
j '^' Tr (B(T)Q(X|y„Rx,,,T)) dr > 0, V? = 1, . . . ,M - 1 



(250) 



and such that < Rxq^ < RxGj_i,for j = 2, . . . , M and ^ R-Xgi ^ S'. Furthermore, Q(X|F^, Rxgj, ^j) ^ 
0, for all j = 1,...,M. 

Proof: We will prove the above using induction. 

The case of j — 1: This is identical to the proof given in Section V-E. 

For a general j: We assume the above holds for j and prove for j + 1. Due to the Markov relation (238) we 
have that, 

Ex|v,+iW-Ex|v,+i,v,WdEx|v,W, Vi (251) 

from which we can conclude that, 

Q(X|y,+i,RxG,,0-Q(^l^j-^j+i,RxG,,i), Vi (252) 

and thus, 

Q(x|y,+i,RxG,,t) = Q(x|y,y,+i,RxG,,t) > Q(x|y„RxG,,0, Vi. (253) 

Since B(t) is a diagonal positive semidefinite matrix for all t, this leads to, 

Tr(B(t)Q(X|y,+i,RxG,,i)) >Tr(B(i)Q(X|y„RxG,,i)), Vi. (254) 

Now, taking into account the induction assumptions on j, together with (253) and (254) we have, 

Q(X|yj+i,RxG,,i,+i) > Q(X|y„RxG,,i,+i) > 
j '^' Tr (B(T)Q(X|y,+i,RxG,,r)) At > j '^' Tr (B(T)Q(X|y„RxG,,T)) dr > (255) 
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which can also be written as: 

Q(X|y,+i,Rxc,,t,+i)hO 

/ {XGj;YGjitj+i)) > I (X; Y{t,+,)\V,+i) . (256) 

These are the two conditions required for Lemma 15, with Xg^^ = Xgj- Thus, according to Lemma 15 there 
exists a Gaussian random vector with covariance Rxaj+i such that, 

1) 'R'Xaj+l ^ ^Xaj 

2) I {XG,+i:YG,+i{t,+i)) = I {X;Y{t,+,)\V,+,) 

3) Q(X|F,+i,Rx«,+i,tj+i)hO 
Property 2 is equivalent to. 



I '^' Tr (B(T)Q(X|y,+i,Rx,,+i, r)) dr = 
and from property 3, Corollary 6, and Theorem 8 we can conclude the following: 

J '^' Tr (B(T)Q(X|y,+i,Rx«,+i,T)) dr = J '^' Tr (B(r)Q(X|y,+i, Rx,,+i,r)) dr 



(257) 



Tr B(t)Q(X|V,+i,Rx,,+i,t) dr 



= 



Tr B(t)Q(X|F,+i,Rxc,+i,t) dT>0. (258) 



Together with property 1, this concludes the proof of the induction. ■ 

Lemma 22 provides us with Gaussian random vectors Xgj with covariance matrices Rxg with the following 

properties: 

1) ^ Rxaj ^ ^Xaj-v for i = 2, . . . , M - 1 and ^ Rx^i ^ S. 



2) IiX;Y,\V,) = ilog I + H,Rx,,HT 



for j = 1,...,M- 1. 

for j = 1,...,M-1 



3) I{X;Y,+,\V,) < ilog I + H,+iRx,,Hj+i 
Substituting these results into the single-letter expression (240), and defining, 

Rgi = S — Rxgi 



Rr 



R, 



^Gj - ^Xaj-1 - 
^GM = R.XgM-1 



R 



Xa j ' 



Vj 



,M-1 



(259) 



provides the following upper bound. 



^M 



< ^log 



^j < 2'°g 



HmR-gm (Hm) + 



HjZ;,=,-+iRGi(Hj) +1 



Vj 



M 



where Rgj ^xe some positive semidefinite matrices such that ^ X](=i ^g 
proof. 



. , M - 1 (260) 

S. This completes the converse 
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The above upper bounds can be attained simultaneously using a joint Gaussian distribution on the tuple, 

{Vo = %,Vi,...,Vm-uVm = X) (261) 

as follows: 



V:,-l + U, 



(262) 



where Uj ^ A/^(0, Rq^) for j = 1, ...,M, independent of each other, and where R-Gj ^e positive semidefinite 



,M 



matrices such that J2i=i ^Gi ^ S. Thus, we attain the upper bounds for j = 1, ..., M — 1 as follows: 

R, < I{V,;Y,\V,^^) 



1 



log 



M 


1 


M 


H,)_^RGKH,f + 1 


- 2'°g 


H, )_^ RGiiUjf + I 


i=j 




i=j+i 



For j — M we obtain the following: 



\M 



< IiX;YM\VM-i) 



:log 



HmRgm 



(Hm)' 



Thus, we have shown that (260) is the capacity region under the covariance constraint. 



(263) 



(264) 



D. Proof of Lemma 17 

The proof of this lemma follows the proof of [27, Lem. 4], which is very similar to the well known proof for the 
capacity region of a degraded BC in [1]. The proof of the direct part relies on successive decoding at the stronger 
user and is practically identical to that found in [1]. We will detail the converse proof only. 

Let Y^. denote a sequence of n channel outputs of the i^'th realization of user j and let Wj for j = \, . . . ,M 
denote the message indices. Furthermore, let Yl,{l) be the Tth sample of Y^. and Yl\l, ...,/ — 1) be the set 
of all samples up to ^ — 1 (including). We use similar notation for all other random variables. As the capacity 
region depends only on the marginals Py^j .^ we may assume without loss of generality that indeed the mutual 
distribution is such that 

{Wi,. . . , Wm) X Y.^^^ - l^M(M-l) - ^iM-i ~ ^*M-l)(M-2) - ■ 



form a Markov chain for every choice of ii,i2, 



\r2 AA* \^1 

^ i^ ■'21 -^ ii 



(265) 



, *M- 
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Using Fano's inequality and the fact that Wj are independent messages we can write an upper bound of Rj for 
any j — 1, . . . ,M which holds for every ij G {1, . . . ,Kj}: 



U/„. ,-., 



R, < -/ [Wf,Yi^ |W-,_i, . . . , T4-1 j + Sin) 

= - E ^ i^fYh il)\W,-i, ...,WuYi^{l,...,l-l))+ S{n) (266) 

" 1=1 

= -E (h (n wi^^-i' • • • '^I'^^i' •••''- i)'^lo-i)(i' • • • '^ - 1)) 

-h [yI^{1)\W,, . . .,W^,Yi^(l,.. .,1- 1))) +5{n) (267) 

1 " 

^„EKn«iw^-i,---,w^i,n(i'---''-i)'n.-i)(i'---'^-i)) 

-h (y^/OIW^,, • • . , W-i,l^f,+i),(l, ...,?- 1), 1^^.(1, ■■■,1- 1))) +'5(n) (268) 

1 " 

-E (h (n W|W^.-i, • • -,1^1,^(1' •••''- l)'^l0-i)(l' • • • ,^ - 1) 



n 

1=1 



-h nwiw^j", • • . , w^i,n+i),(i' • • • ,^ - 1) + -^H (269) 



1 

= -EKnwi^-i'--"^i'^i(i'---''-i)'^la-i)(i'---'^-i) 

-h (y^'/OIW^j-, • • . , W-i,l'f,+i),(l, ...,?- 1), Y,Vi)(l' • • • '^ - 1))) +'^W (270) 

1 " 

-h (y^/OIW^.-, • • . , W-i,l^f,+i),(l, ...,?- l),l^,Vi)(l' • • • '^ - 1))) +^W (271) 

= -E (h (n WI-^^.-i(O) - h (r^"^.(0|V,_i(0,F,(0)) +<5H (272) 

1=1 

= -E^(^^(0;^i(OI^.-i(0) +'5(n) (273) 



n 



where (5(n) ^ as n ^- oo. The equality in (266) is due to the chain rule of mutual information. The equality in 
(267) is due the the Markov chain (Wi, ..., Wm) X Yj — Y*,-^. and the memoryless nature of the channel, 
as can be seen in the following identity 



p{Yi^{l)\W,^^,...,W^,Yi^il,...,l-l),Y*^^_,^{l,...J-l)} 
p{Y*^^_^^{l,...,l-l)\W,^^,...,W^,Yi^{l,...,l-l)} 
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P 



{r^^(0|VF,_i,...,w^,r^^(i,...,?-i)}p{r*(^_,)(i,...,;-i)|T4^_i,...,vFi,y^/i,...,/-i),r^^(0} 
p{r;(^._,)(i,...,;-i)|VK,_i,...,w^i,r^'^(i,...,?-i)} 
^ p{r^^(0|vt/,_i,...,^,r^^(i,...,?-i)}p{r-(^^,)(i,...,?-i)|r^^(i,...,?-i)} 

^{^;o+i)(i'-'^-i)l^i(i'-'^-i)} 

The inequality in (268) follows from the fact that conditioning decreases entropy. (269) and (270) follow, again, form 
the Markov chain (Wi, ..., Wm) X ^Ij+dj ^ ^l- ^ ^^(,-1) ^^'^ '^^e memoryless nature of the channel. (271) 
follows from the fact that conditioning decreases entropy. In (272) we used the following definition of auxiliary 
random variables: 

y,(0= (t4-„...,W-i,1'^,+i),(1,...,;-1)). (274) 

Next we replace the index I with a random variable / which is uniformly distributed over the integers 1, . . . , n and 

define V j ~ (V^ (/),/) ,X = X{I),Y\, = Yl,{I). As the channel is memoryless, we get 

Rj<l(v,;Yi^\Vj^?j+Sin) (275) 

for all j and for all ij. Note that as the channel is memoryless these auxiliary random variables satisfy the Markov 
chain defined in (160). Moreover, from this definition one can easily see that Vo = and the largest region will be 
attained when Vm ^ X. Finally, as the above inequalities hold for every j = 1, . . . ,M and every ij — 1, . . . ,Kj, 
we complete the proof by taking n to infinity. 
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